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iNTRnnnrrTQN, 

Technical Firirf 

The field of this invention concerns segment polarity genes and their uses. 
10 Background 

Segment polarity genes were discovered in flies as mutations which change 
the pattern of structures of the body segments. Mutations in the genes cause animals 
to develop the changed patterns on the surfaces of body segments, the changes 
affecting the pattern along the head to tail axis. For example, mutations in the gene 

15 patched cause each body segment to develop without the normal structures in the 
center of each segment. In their stead is a mirror image of the pattern normally 
found in the anterior segment. Thus cells in the center of the segment make the 
wrong structures, and point them in the wrong direction with reference to the over 
all head-to-tail polarity of the animal. About sixteen genes in the class are known. 

20 The encoded proteins include kinases, transcription factors, a cell junction protein, 
two secreted proteins called wingless (WG) and hedgehog (HH), a single 
transmembrane protein called patched (PTC), and some novel proteins not related to 
any known protein. All of these proteins are believed to work together in signaling 
pathways that inform cells about their neighbors in order to set cell fates and 
25 polarities. 
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Many of the segment polarity proteins of Drosophila and other invertebrates 
are closely related to vertebrate proteins, implying that the molecular mechanisms 
involved are ancient. Among the vertebrate proteins related to the fly genes are En- 
1 and -2, which act in vertebrate brain development and WNT-1, which is also 
5 involved in brain development, but was first found as the oncogene implicated in 
many cases of mouse breast cancer. In flies, the patched gene is transcribed into 
RNA in a complex and dynamic pattern in embryos, including fine transverse stripes 
in each body segment primordium. The encoded protein is predicted to contain 
many transmembrane domains. It has no significant similarity to any other known 
10 protein. Other proteins having large numbers of transmembrane domains include a 
variety of membrane receptors, channels through membranes and transporters 
through membranes. 

The hedgehog (HH) protein of flies has been shown to have at least three 
vertebrate relatives: Sonic hedgehog (Shh); Indian hedgehog, and Desert hedgehog. 
15 The Shh is expressed in a group of cells at the posterior of each developing limb 
bud. This is exactly the same group of cells found to have an important role in 
signaling polarity to the developing limb. The signal appears to be graded, with 
cells close to the posterior source of the signal forming posterior digits and other 
limb structures and cells farther from the signal source forming more anterior 
20 structures. It has been known for many years that transplantation of the signaling 
cells, a region of the limb bud known as the "zone of polarizing activity (ZPA)" has 
dramatic effects on limb patterning. Implanting a second ZPA anterior to the limb 
bud causes a limb to develop with posterior features replacing the anterior ones (in 
essence little fingers instead of thumbs). Shh has been found to be the long sought 
25 ZPA signal. Cultured cells making Shh protein (SHH), when implanted into the 
anterior limb bud region, have the same effect as an implanted ZPA. This 
establishes that Shh is clearly a critical trigger of posterior limb development. 

The factor in the ZPA has been thought for some time to be related to 
another important developmental signal that polarizes the developing spinal cord. 
30 The notochord, a rod of mesoderm that runs along the dorsal side of early vertebrate 
embryos, is a signal source that polarizes the neural tube along the dorsal-ventral 
axis. The signal causes the part of the neural tube nearest to th notochord to form 
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floor plate, a morphologically distinct part of the neural tube. The floor plate, in 
turn, sends out signals to the more dorsal parts f the neural tube to further 
determine cell fetes. The ZPA was reported to have the same signaling effect as the 
notochord when transplanted to be adjacent to the neural tube, suggesting the ZPA 
5 makes the same signal as the notochord. In keeping with this view, Shh was found 
to be produced by notochord cells and floor plate cells. Tests of extra expression of 
Shh in mice led to the finding of extra expression of floor plate genes in cells which 
would not normally turn them on. Therefore Shh appears to be a component of the 
signal from notochord to floor plate and from floor plate to more dorsal parts of the 
10 neural tube. Besides limb and neural tubes, vertebrate hedgehog genes are also 
expressed in many other tissues including, but not limited to the peripheral nervous 
system, brain, lung, liver, kidney, tooth primordia, genitalia, and hindgut and 
foregut endoderm. 

PTC has been proposed as a receptor for HH protein based on genetic 
15 experiments in flies. A model for the relationship is that PTC acts through a largely 
unknown pathway to inactivate both its own transcription and the transcription of the 
wingless segment polarity gene. This model proposes that HH protein, secreted 
from adjacent cells, binds to the PTC receptor, inactivates it, and thereby prevents 
PTC from turning off its own transcription or that of wingless. A number of 
20 experiments have shown coordinate events between PTC and HH. 

Relevant T ifftffltiirfl 

Descriptions of patched, by itself or its role with hedgehog may be found in 
Hooper and Scott, Cell 59, 751-765 (1989); Nakano et al., Nature, 341, 508-513 
(1989) (both of which also describes the sequence for Drosophila patched) Simcox 
25 et al., Development 107, 715-722 (1989); Hidalgo and Ingham, Development, 110, 
291-301 (1990); Phillips etal., Development, 110, 105-114 (1990); Sampedro and 
Guerrero, Nature 353, 187-190 (1991); Ingham et al., Nature 353, 184-187 (1991); 
and Taylor et al., Mechanisms of Development 42, 89-96 (1993). Discussions of 
the role of hedgehog include Riddle et al., Cell 75, 1401-1416 (1993); Echelard et 
30 al., Cell 75, 1417-1430 (1993); Krauss etal., Cell 75, 1431-1444 (1993); Tabata 
and Romberg, Cell 76, 89-102 (1994); Heemskerk & DiNardo, Cell 76, 449-460 
(1994); Relink et al., Cell 76, 761-775 (1994); and a short review article by 
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Ingham, Current Biology 4, 347-350 (1994). The sequence for the Drosophila 5' 
non-coding region was reported to the G nBank, accession number M28418, 
referred to in Hooper and Scott (1989), supra. See also, Forbes, et al. , 
Development 1993 Supplement 115-124. 

5 

SirpMABV OF THF. INVENTION 
Methods for isolating patched genes, particularly mammalian patched genes, 
including the mouse and human patched genes, as well as invertebrate patched genes 
and sequences, are provided. The methods include identification of patched genes 
10 from other species, as well as members of the same family of proteins. The subject 
genes provide methods for producing the patched protein, where the genes and 
proteins may be used as probes for research, diagnosis, binding of hedgehog protein 
for its isolation and purification, gene therapy, as well as other utilities. 

15 flpTFP nrarRTP TTHN of twf. DRAWINGS 

Fig. 1 is a graph having a restriction map of about lOkbp of the 5* region 
upstream from the initiation codon of Drosophila patched gene and bar graphs of 
constructs of truncated portions of the 5* region joined to P-galactosidase, where the 
constructs are introduced into fly cell lines for the production of embryos. The 

20 expression of p-gal in the embryos is indicated in the right-hand table during early 
and late development of the embryo. The greater the number of +'s, the more 
intense the staining. 

pFSPRIPTTON OF THF SPFfTFTP EMBODIMENTS 
25 Methods are provided for identifying members of ihcpatched (ptc) gene 

family from invertebrate and vertebrate, e.g. mammalian, species, as well as the 
entire cDNA sequence of the mouse and human patched gene. Also, sequences for 
invertebrate patched genes are provided. The patched gene encodes a 
transmembrane protein having a large number of transmembrane sequences. 
30 In identifying the mouse and human patched genes, primers were employed 

to move through the evolutionary tree from the known Drosophila ptc sequence. 
Two primers are employed from the Drosophila sequence with appropriate 
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restriction enzyme linkers to amplify portions of genomic DNA of a related 
invertebrate, such as mosquito. The sequences are selected from regions which are 
not likely to diverge over evolutionary time and are of low degeneracy. 
Conveniently, the regions are the N-tenninal proximal sequence, generally within 
5 the first 1 .5kb, usually within the first lkb, of the coding portion of the cDNA, 
conveniently in the first hydrophilic loop of the protein. Employing the polymerase 
chain reaction (PCR) with the primers, a band can be obtained from mosquito 
genomic DNA. The band may then be amplified and used in turn as a probe. One 
may use this probe to probe a cDNA library from an organism in a different branch 
10 of the evolutionary tree, such as a butterfly. By screening the library and 
identifying sequences which hybridize to the probe, a portion of the butterfly 
patched gene may be obtained. One or more of the resulting clones may then be 
used to rescreen the library to obtain an extended sequence, up to and including the 
entire coding region, as well as the non-coding 5'- and 3 '-sequences. As 
15 appropriate, one may sequence all or a portion of the resulting cDNA coding 
sequence. 

One may then screen a genomic or cDNA library of a species higher in the 
evolutionary scale with appropriate probes from one or both of the prior sequences. 
Of particular interest is screening a genomic library, of a distantly related 

20 invertebrate, e.g. beetle, where one may use a combination of the sequences 
obtained from the previous two species, in this case, the Drosophila and the 
butterfly. By appropriate techniques, one may identify specific clones which bind to 
the probes, which may then be screened for cross hybridization with each of the 
probes individually. The resulting fragments may then be amplified, e.g. by 

25 subcloning. 

By having all or parts of the 4 different patched genes, in the presently 
illustrated example, Drosophila (fly), mosquito, butterfly and beetle, one can now 
compare the patched genes for conserved sequences. Cells from an appropriate 
mammalian limb bud or other cells expressing patched, such as notochord, neural 
30 tube, gut, lung buds, or other tissue, particularly fetal tissue, may be employed for 
screening. Alternatively, adult tissue which produces patched may be employed for 
saeening. Based on the consensus sequence available from the 4 other species, one 
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can develop probes where at each site at least 2 of the sequences have the same 
nucleotide and where the site varies that each species has a unique nucleotide, 
inosine may be used, which binds to alU nucleotides. 

Either PCR may be employed using primers or, if desired, a genomic library 
5 from an appropriate source may be probed. With PCR, one may use a cDNA 
library or use reverse transcriptase-PCR (RT-PCR), where mRNA is available from 
the tissue. Usually, where fetal tissue is employed, one will employ tissue from the 
first or second trimester, preferably the latter half of the first trimester or the second 
trimester, depending upon the particular host. The age and source of tissue will 
10 depend to a significant degree on the ability to surgically isolate the tissue based on 
its size, the level of expression of patched in the cells of the tissue, the accessibility 
of the tissue, the number of cells expressing patched and the like. The amount of 
tissue available should be large enough so as to provide for a sufficient amount of 
mRNA to be usefully transcribed and amplified. With mouse tissue, limb bud of 
15 from about 10 to 15 dpc (days post conception) may be employed. 

In the primers, the complementary binding sequence will usually be at least 
14 nucleotides, preferably at least about 17 nucleotides and usually not more than 
about 30 nucleotides. The primers may also include a restriction enzyme sequence 
for isolation and cloning. With RT-PCR, the mRNA may be enriched in accordance 
20 with known ways, reverse transcribed, followed by amplification with the 

appropriate primers. (Procedures employed for molecular cloning may be found in 
Molecular Cloning: A Laboratory Manual, Sambrook et al., eds., Cold Spring 
Harbor Laboratories, Cold Spring Harbor, NY, 1988). Particularly, the primers may 
conveniently come from the N-terminal proximal sequence or other conserved 
25 region, such as those sequences where at least five amino acids are conserved out of 
eight amino acids in three of the four sequences. This is illustrated by the sequences 
(SEQ ID NO:ll) HTPLDCFWEG, (SEQ ID NO:12) LIVGG, and (SEQ ID NO:13) 
PFFWEQY. Resulting PCR products of expected size are subcloned and may be 
sequenced if desired. 

30 The cloned PCR fragment may then be used as a probe to screen a cDNA 

library of mammalian tissue cells expressing patched, where hybridizing clones may 
be isolated under appropriate conditions of stringency. Again, the cDNA library 
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should come from tissue which expresses patched, which tissue will come within the 
limitations previously described. Clones which hybridize may be subcloned and 
rescreened. The hybridizing subclones may then be isolated and sequenced or may 
be further analyzed by employing RNA blots and in situ hybridizations in whole and 
5 sectioned embryos. Conveniently, a fragment of from about 0.5 to lkbp of the N- 
terminal coding region may be employed for the Northern blot. 

The mammalian gene may be sequenced and as described above, conserved 
regions identified and used as primers for investigating other species. The N- 
terminal proximal region, the C-terminal region or an intermediate region may be 
10 employed for the sequences, where the sequences will be selected having minimum 
degeneracy and the desired level of conservation over the probe sequence. 

The DNA sequence encoding PTC may be cDNA or genomic DNA or 
fragment thereof, particularly complete exons from the genomic DNA, may be 
isolated as the sequence substantially free of wild-type sequence from the 
15 chromosome, may be a 50 kbp fragment or smaller fragment, may be joined to 
heterologous or foreign DNA, which may be a single nucleotide, oligonucleotide of 
up to 50 bp, which may be a restriction site or other identifying DNA for use as a 
primer, probe or the like, or a nucleic acid of greater than 50 bp, where the nucleic 
acid may be a portion of a cloning or expression vector, comprise the regulatory 
20 regions of an expression cassette, or the like. The DNA may be isolated, purified 
being substantially free of proteins and other nucleic acids, be in solution, or the 
like. 

The subject gene may be employed for producing all or portions of the 
patched protein. The subject gene or fragment thereof, generally a fragment of at 

25 least 12 bp, usually at least 18 bp, may be introduced into an appropriate vector for 
extrachromosomal maintenance or for integration into the host. Fragments will 
usually be immediately joined at the 5' and/or 3' terminus to a nucleotide or 
sequence not found in the natural or wild-type gene, or joined to a label other than a 
nucleic acid sequence. For expression, an expression cassette may be employed, 

30 providing for a transcriptional and translational initiation region, which may be 
inducible or constitutive, the coding region under the transcriptional control of the 
transcriptional initiati n region, and a transcriptional and translational termination 
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region. Various transcriptional initiation regions may be employed which are 
functional in the expression h st. The peptide may be expressed in prokaryotes or 
eukaryotes in accordance with conventional ways, depending upon the purpose for 
expression. For large production of the protein, a unicellular organism or cells of a 
5 higher organism, e.g. eukaryotes such as vertebrates, particularly mammals, may be 
used as the expression host, such as E. coli, B, subtilis, S. cerevisiae, and the like. 
In many situations, it may be desirable to express the patched gene in a mammalian 
host, whereby the patched gene will be transported to the cellular membrane for 
various studies. The protein has two parts which provide for a total of six 
10 transmembrane regions, with a total of six extracellular loops, three for each part. 
The character of the protein has similarity to a transporter protein. The protein has 
two conserved glycosylation signal triads. 

The subject nucleic acid sequences may be modified for a number of 
purposes, particularly where they will be used intracellularly, for example, by being 
15 joined to a nucleic acid cleaving agent, e.g. a chelated metal ion, such as iron or 
chromium for cleavage of the gene; as an antisense sequence; or the like. 
Modifications may include replacing oxygen of the phosphate esters with sulfur or 
nitrogen, replacing the phosphate with phosphoramide, etc. 

With the availability of the protein in large amounts by employing an 
20 expression host, the protein may be isolated and purified in accordance with 

conventional ways. A lysate may be prepared of the expression host and the lysate 
purified using HPLC, exclusion chromatography, gel electrophoresis, affinity 
chromatography, or other purification technique. The purified protein will generally 
be at least about 80% pure, preferably at least about 90% pure, and may be up to 
25 100% pure. By pure is intended free of other proteins, as well as cellular debris. 

The polypeptide may be used for the production of antibodies, where short 
fragments provide for antibodies specific for the particular polypeptide, whereas 
larger fragments or the entire gene allow for the production of antibodies over the 
surface of the polypeptide or protein, where the protein may be in its natural 

30 conformation. 

Antibodies may be prepared in accordance with conventional ways, where 
the expressed polypeptide or protein may be used as an immunogen, by itself or 
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obtain further 5' sequence to ensure that one has at least a functional portion of the 
enhancer. It is found that the enhancer is proximal to the 5' coding region, a 
portion being in the transcribed sequence and downstream from the promoter 
sequences. The transcriptional initiation region may be used for many purposes, 
5 studying embryonic development, providing for regulated expression of patched 
protein or other protein of interest during embryonic development or thereafter, and 
in gene therapy. 

The gene may also be used for gene therapy, by transfection of the normal 
gene into embryonic stem cells or into mature cells. A wide variety of viral vectors 
10 can be employed for transfection and stable integration of the gene into the genome 
of the cells. Alternatively, micro-injection may be employed, fusion, or the like for 
introduction of genes into a suitable host cell. See, for example, Dhawan et al. , 
Science 254, 1509-1512 (1991) and Smith et al.. Molecular and Cellular Biology 
(1990) 3268-3271. 

15 By providing for the production of large amounts of PTC protein, one can 

use the protein for identifying ligands which bind to the PTC protein. Particularly, 
one may produce the protein in cells and employ the polysomes in columns for 
isolating ligands for the PTC protein. One may incorporate the PTC protein into 
liposomes by combining the protein with appropriate lipid surfactants, e.g. 
20 phospholipids, cholesterol, etc., and sonicate the mixture of the PTC protein and the 
surfactants in an aqueous medium. With one or more established ligands, e.g. 
hedgehog, one may use the PTC protein to screen for antagonists which inhibit the 
binding of the ligand. In this way, drugs may be identified which can prevent the 
transduction of signals by the PTC protein in normal or abnormal cells. 
25 The PTC protein, particularly binding fragments thereof, the gene encoding 

the protein, or fragments thereof, particularly fragments of at least about 18 
nucleotides, frequently of at least about 30 nucleotides and up to the entire gene, 
more particularly sequences associated with the hydrophilic loops, may be employed 
in a wide variety of assays. In these situations, the particular molecules will 
30 normally be joined to another molecule, serving as a label, where the label can 
direcdy or indirectiy provide a detectable signal. Various labels include 
radioisotopes, fluorescers, chemiluminescers, enzymes, specific binding molecules, 

11 
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particles, e.g. magnetic particles, and the like. Specific binding molecules include 
pairs, such as biotin and streptavidin, digoxin and antidigoxin etc. For the specific 
binding members, the complementary member would normally be labeled with a 
molecule which provides for detection, in accordance with known procedures. The 
5 assays may be used for detecting the presence of molecules which bind to the 

patched gene or PTC protein, in isolating molecules which bind to the patched gene, 
for measuring the amount of patched, either as the protein or the message, for 
identifying molecules which may serve as agonists or antagonists, or the like. 

Various formats may be used in the assays. For example, mammalian or 
10 invertebrate cells may be designed where the cells respond when an agonist binds to 
PTC in the membrane of the cell. An expression cassette may be introduced into 
the cell, where the transcriptional initiation region of patched is joined to a marker 
gene, such as P-galactosidase, for which a substrate forming a blue dye is available. 
A 1 .5kb fragment that responds to PTC signaling has been identified and shown to 
15 regulate expression of a heterologous gene during embryonic development. When 
an agonist binds to the PTC protein, the cell will turn blue. By employing a 
competition between an agonist and a compound of interest, absence of blue color 
formation will indicate the presence of an antagonist. These assays are well known 
in the literature. Instead of cells, one may use the protein in a membrane 
20 environment and determine binding affinities of compounds. The PTC may be 
bound to a surface and a labeled ligand for PTC employed. A number of labels 
have been indicated previously. The candidate compound is added with the labeled 
ligand in an appropriate buffered medium to the surface bound PTC. After an 
incubation to ensure that binding has occurred, the surface may be washed free of 
25 any non-specifically bound components of the assay medium, particularly any non- 
specifically bound labeled ligand, and any label bound to the surface determined. 
Where the label is an enzyme, substrate producing a detectable product may be used. 
The label may be detected and measured. By using standards, the binding affinity of 
the candidate compound may be determined. 
30 The availability of the gene and the protein allows for investigation of the 

development of the fetus and the role patched and other molecules play in such 
development. By employing antisense sequences of the patched gene, where the 
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sequences may be introduced in cells in culture, or a vector providing for 
transcription of the antisense of the patched gene introduced into the cells, one can 
investigate the role the PTC protein plays in the cellular development. By providing 
for the PTC protein or fragment thereof in a soluble form which can compete with 
5 the normal cellular PTC protein for ligand, one can inhibit the binding of ligands to 
the cellular PTC protein to see the effect of variation in concentration of ligands for 
the PTC protein on the cellular development of the host. Antibodies against PTC 
can also be used to block function, since PTC is exposed on the cell surface. 
The subject gene may also be used for preparing transgenic laboratory 
10 animals, which may serve to investigate embryonic development and the role the 
PTC protein plays in such development. By providing for variation in the 
expression of the PTC protein, employing different transcriptional initiation regions 
which may be constitutive or inducible, one can determine the developmental effect 
of the differences in PTC protein levels. Alternatively, one can use the DNA to 
15 knock out the PTC protein in embryonic stem cells, so as to produce hosts with only 
a single functional patched gene or where the host lacks a functional patched gene. 
By employing homologous recombination, one can introduce a patched gene, which 
is differentially regulated, for example, is expressed to the development of the fetus, 
but not in the adult. One may also provide for expression of tot patched gene in 
20 cells or tissues where it is not normally expressed or at abnormal times of 
development. One may provide for mis-expression or failure of expression in 
certain tissue to mimic a human disease. Thus, mouse models of spina bifida or 
abnormal motor neuron differentiation in the developing spinal cord are made 
available. In addition, by providing expression of PTC protein in cells in which it is 
25 otherwise not normally produced, one can induce changes in cell behavior upon 
binding of ligand to the PTC protein. 

Areas of investigation may include the development of cancer treatments. 
The wingless gene, whose transcription is regulated in flies by PTC, is closely 
related to a mammalian oncogene, Wnt-1, a key factor in many cases of mouse 
30 breast cancer. Other Wnt family members, which are secreted signaling proteins, 
are implicated in many aspects f development. In flies, the signaling factor 
decapentaplegic, a member of the TGF-beta family of signaling proteins, known to 
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affect growth and development in mammals, is also controlled by PTC. Since 
members of both the TGF-beta and Wnt families are expressed in mice in places 
close to overlapping with patched, the common regulation provides an opportunity 
in treating cancer. Also, for repair and regeneration, proliferation competent cells 
5 making PTC protein can find use to promote regeneration and healing for damaged 
tissue, which tissue may be regenerated by transfecting cells of damaged tissue with 
the pre gene and its normal transcription initiation region or a modified transcription 
initiation region. For example, PTC may be useful to stimulate growth of new teeth 
by engineering cells of the gums or other tissues where PTC protein was during an 
10 earlier developmental stage or is expressed. 

Since Northern blot analysis indicates that ptc is present at high levels in 
adult lung tissue, the regulation of ptc expression or binding to its natural ligand 
may serve to inhibit proliferation of cancerous lung cells. The availability of the 
gene encoding PTC and the expression of the gene allows for the development of 
15 agonists and antagonists. In addition, PTC is central to the ability of neurons to 
differentiate early in development. The availability of the gene allows for the 
introduction of PTC into host diseased tissue, stimulating the fetal program of 
division and/or differentiation. This could be done in conjunction with other genes 
which provide for the ligands which regulate PTC activity or by providing for 
20 agonists other than the natural ligand. 

The availability of the coding region for various ptc genes from various 
species, allows for the isolation of the 5' non-coding region comprising the promoter 
and enhancer associated with theprc genes, so as to provide transcriptional and post- 
transcriptional regulation of theprc gene or other genes, which allow for regulation 
25 ofgenesinrelationtotheregulationofthep/cgene. Since the/«c gene is 

autoregulated, activation of the ptc gene will result in activation of transcription of a 
gene under the transcriptional control of the transcriptional initiation region of the 
ptc gene. The transcriptional initiation region may be obtained from any host 
species and introduced into a heterologous host species, where such initiation region 
30 is functional to the desired degree in the foreign host. For example, a fragment of 
from about 1.5 kb upstream from the initiati n codon, up to about lOkb, preferably 
up to about 5 kb may be used to provide for transcriptional initiation regulated by 
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the PTC protein, particularly the Drosophila 5'-non-codi„g region (GenBank 
accession no. M28418). 

The following examples are offered by illustration not by way of limitation. 
MefhnH< MatfTiilll 

L PGR on Mosniiito Mmmttrto vamhin* r rnm[r 

PCR primers were based on amino acid stretches of fly PTC that were not 
10 likely to diverge over evolutionary time and were of low degeneracy. Two such 
primers (P2R1 (SEQ ID NO: 14): IifiACCAATJECAARGTNCAYCARYTNTGG 
P4R1: (SEQ ID NO:15) GGACfi A ATTrCYTCCCARAARCANTC, (the 
underlined sequences are Eco RI linkers) amplified an appropriately sized band from 
mosquito genomic DNA using the PCR. The program conditions were as follows: 
15 94 °C 4 min.; 72 °C Add Taq; 

[49 -C 30 sec.; 72 "C 90 sec.; 94 "C 15 sec] 3 times 
[94 'C 15 sec.; 50 «C 30 sec.; 72 'C 90 sec] 35 times 
72 °C 10 min; 4 °C hold 

This band was subcloned into the EcoRV site of pBluescript n and sequenced using 
20 the USB Sequence kit. 



D - Screen of a Butterfly cdma t ^ nrf mfh Mncqii; , n pfR 

Using the mosquito PCR product (SEQ ID NO:7) as a probe, a 3 day 
embryonic Precis coenia AgtlO cDNA library (generously provided by Sean 
25 Carroll) was screened. Filters were hybridized at 65 'C overnight in a solution 
containing 5xSSC, 10% dextran sulfate, 5x Denhardfs, 200 „g/ml sonicated 
salmon sperm DNA, and 0.5% SDS. Filters were washed in 0.1 X SSC, 0.1 % SDS 
at room temperature several times to remove nonspecific hybridization. Of the 
100,000 plaques initially screened, 2 overlapping clones, LI and 12, were isolated 
30 which corresponded to the N terminus of butterfly PTC. Using L2 as a probe, the 
library filters were rescreened and 3 additional clones (L5, L7, L8) were isolaied 
which encompassed the remainder of the pre coding sequence. The full length 
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sequence 
sequencing 



of butterfly pic (SEQ ID NO:3) was determined by ABI automated 



m> <j rrfT n r f , Trihnli m n-m* ftennmir T ihnrv with MffWlritn Pfff FffidUCt 

5 m fi om hp Fra fn"-"* ft™" th<» Butterfly Clone 

A Xgeml 1 genomic library from Tribolium casteneum (gift of Rob Dennell) 
was probed with a mixture of the mosquito PCR (SEQ ID NO:7) product and 
BstXI/EcoRI fragment of 12. Filters were hybridized at 55 °C overnight and 
washed as above. Of the 75,000 plaques screened, 14 clones were identified and the 

10 Sacl fragment of T8 (SEQ ID NO:l), which crosshybridized with the mosquito and 
butterfly probes, was subcloned into pBluescript. 




IV. 

rtm w* 1 m the Fnnr Invrt tiomoloeues 
15 Two degenerate PCR primers (P4REV: (SEQ ID NO:16) 

qa a rn A ATTC YTNGANTG YTTYTGGG A; P22: (SEQ ID NO: 17) 
r ATA rr A r.rr a AGrrTTGT CIGGCCARTGCAT) were designed based on a 
comparison of PTC amino acid sequences from fly (Drosophila melanogaster) (SEQ 
ID NO:6), mosquito {Anopheles gambiae)(SEQ ID NO:8), butterfly (Precis 
20 co«!w)<SEQroNO:4),andbc^ 1 
represents inosine, which can form base pairs with all four nucleotides. P22 was 
used to reverse transcribe RNA from 12.5 dpc mouse limb bud (gift from David 
Kingsley) for 90 min at 37 »C. PCR using P4REV(SEQ ID NO:17) and P22(SEQ 
ID NO: 18) was then performed on 1 (A of the resultant cDNA under the following 

25 conditions: 

94 °C 4 min.; 72 °C Add Taq; 
[94 °C 15 sec.; 50 °C 30 sec.; 72 °C 90 sec.] 35 times 
72 °C 10 min.; 4 °C hold 
PCR products of the expected size were subcloned into the TA vector (fovitrogen) 
30 and sequenced with the Sequenase Version 2.0 DNA Sequencing Kit (U.S.B.). 
Using the cloned mouse PCR fragment as a probe, 300,000 plaques of a 
mouse 8.5 dpc AgtlO cDNA library (a gift from Brigid Hogan) were screened at 
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65 °C as above and washed in 2x SSC, 0.1% SDS at room temperature. 7 clones 
were isolated, and three (M2 M4, and M8) were subcloned into pBluescript II. 
200,000 plaques of this library were rescreened using first, a 1 . 1 kb EcoRI fragment 
from M2 to identify 6 clones (M9-M16) and secondly a mixed probe containing the 
5 most N terminal (Xhol fragment from M2) and most C terminal sequences 

(BamHI/Bglll fragment from M9) to isolate 5 clones (M17-M21). M9, M10, M14, 
and M17-21 were subcloned into the EcoRI site of pBluescript II (Strategene). 

V ' RNA Bln " ™d " ^' Hvhririinrionc in Whn>. an H M n,,c» giMtff ym 

10 Northerns: 

A mouse embryonic Northern blot and an adult multiple tissue Northern blot 
(obtained from Clontech) were probed with a 900 bp EcoRI fragment from an N 
terminal coding region of mouse pre. Hybridization was performed at 65 °C in 5x 
SSPE, lOx Denhardt's, 100 M g/ml sonicated salmon sperm DNA, and 2% SDS. 
15 After several short room temperature washes in 2x SSC, 0.05% SDS, the blots were 
washed at high stringency in 0. IX SSC, 0. 1 % SDS at 50C. 
In situ hybridization of sections: 

7.75, 8.5, 11. 5, and 13.5 dpc mouse embryos were dissected in PBS and 
frozen in Tissue-Tek medium at -80 'C. 12-16 M m frozen sections were cut, 
20 collected onto VectaBond (Vector Laboratories) coated slides, and dried for 30-60 
minutes at room temperature. After a 10 minute fixation in 4% paraformaldehyde in 
PBS, the slides were washed 3 times for 3 minutes in PBS, acetylated for 10 minutes 
in 0.25% acetic anhydride in triethanolamine, and washed three more times for 5 
minutes in PBS. Prehybridization (50% forrnamide, 5X SSC, 250 M g/ml yeast 
25 tRNA, 500 „g/ml sonicated salmon sperm DNA, and 5x Denhardt's) was carried 
out for 6 hours at room temperature in 50% formamide/5x SSC humidified 
chambers. The probe, which consisted of 1 kb from the N-terminus of ptc, was 
added at a concentration of 200-1000 ng/ml into the same solution used for 
prehybridization, and then denatured for five minutes at 80 'C. Approximately 75 
30 M l of probe were added to each slide and covered with Parafilm. The slides were 
incubated overnight at 65 «C in the same humidified chamber used previously. The 
following day, the probe was washed successively in 5X SSC (5 minutes, 65 °C), 
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0.2X SSC (1 hour, 65 °C), and 0.2X SSC (10 minutes, room temperature). After 
five minutes in buffer Bl (0.1M maleic acid, 0,15 M NaCl, pH 7.5), the slides were 
blocked for 1 hour at room temperature in 1 % blocking reagent (Boerhinger- 
Mannheim) in buffer Bl, and then incubated for 4 hours in buffer Bl containing the 

5 DIG-AP conjugated antibody (Boerhinger-Mannheim) at a 1:5000 dilution. Excess 
antibody was removed during two 15 minute washes in buffer Bl, followed by five 
minutes in buffer B3 (100 mM Tris, lOOmM NaCl, 5mM MgCl 2 , pH 9.5). The 
antibody was detected by adding an alkaline phosphatase substrate (350 /il 75 mg/ml 
X-phosphate in DMF, 450 pi 50 mg/ml NBT in 70% DMF in 100 mis of buffer B3) 

10 and allowing the reaction to proceed over-night in the dark. After a brief rinse in 10 
mM Tris, ImM EDTA, pH 8.0, the slides were mounted with Aquamount (Lemer 
Laboratories). 

VI. nmsnphila 5 - transcriptional initiation region B-gal constructs. 

15 A series of constructs were designed that link different regions of the ptc 

promoter from Drosophila to a LacZ reporter gene in order to study the cis 
regulation of the ptc expression pattern. See Fig. 1. A 10.8kb BamHI/BspMl 
fragment comprising the 5' -non-coding region of the mRNA at its 3* -terminus was 
obtained and truncated by restriction enzyme digestion as shown in Fig. 1. These 

20 expression cassettes were introduced into Drosophila lines using a P-element vector 
(Thummel et al., Gene 74, 445-456 (1988), which were injected into embryos, 
providing flies which could be grown to produce embryos. (See Spradling and 
Rubin, Science (1982) 218, 341-347 for a description of the procedure.) The vector 
used a pUC8 background into which was introduced the white gene to provide for 

25 yellow eyes, portions of the P-element for integrtion, and the constructs were 
inserted into a polylinker upstream from the LacZ gene. The resulting embryos 
were stained using antibodies to LacZ protein conjugated to HRP and the embryos 
developed with OPD dye to identify the expression of the LacZ gene. The staining 
pattern is described in Fig. 1, indicating whether there was staining during the early 

30 and late development of the embryo. 

VII. Illation of a Mouse ptc Gene 
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Homology „f fly PTC (SEQ ID NO:6) were isolate ft™ ^ ^ 
mosquito, butterfly and beetle, using cite PGR or low stringency librtty screens 
PCR pnmers to six ami*, acid amehes of PTC of low mutatability and degeneracy 
were design*. One primer pair, P2 and P4, Rifled a„ horaologous fammt of 
PK from mosquito genomic DNA that corresponded to the firs, hydrophilic loop of 
to proton. The 345bp PCR product (SEQ n> NO:7) was subclone* and sequenced 
and when aligned to fly PTC, showed 67* amino acid identity. 

The cloned mosquiu. fragment was used to screen a butterfly ACT 10 cDNA 
library, of ,00,000 pbques screened, five overlapping clones were isolated and 
10 used to obtain the full length coding seouence. The butterfly PTC homologue (SEQ 
n> NO:4) is 1.31 1 amino acids Ion, and over*, has 50* amino acid identity (72* 
similarity, to fly PTC. With the exception of , divergent C-terminus, mis homology 
■s evenly spread across the coding sequence. The mosquito PCR clone (SEQ ID 
NO:7) and a corresponding (ragmen, of butterfly cDNA were used to screen a beetle 
15 Xgemll genomic library. Of me piaqt«s screen^, 14 clones were ittentifled A 
fragment of ^ done ™- »■** MrMized with the original probes was 
subcloned and sequenced. . This 3kb piece contain, an 89 amino acid exon (SEQ ID 
N 0:2 ) which is 44* and 5. * identical to the corresponding regions of fly ami 
butterfly PTC respectively. 

20 Using an alignment 0 f the four insect homologues in the firs, hydrophilic loop 
of the PTC, two PCR primers were designed to a five and six amino acid stretch 
which were identical and of low degeneracy. These primers were used to isolate me 
mouse homologue using RT-PCR on embryonic limb bud RNA. An apptopriafcly 
sued band was amptified and upon Coning a* sequencing, i, was found u, encode a 
25 protein 65* identical to fly PTC. Using the clo^d PCR product and subsequently 
fragments of mousey cDNA, , mouse embryonic AcDNA library was screened ' 
From about 300,000 plaques, 17 clones were identified and of these 7 form 
overlapping cDNA* which comprise most of the protein^oding sequence (SEQ m 
NO:9) . 

30 
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* both the embrv«»c a* adult Northern bio*, the,* prob d«. a single 
a* message n*« exposure does not reveal any additional minor bands. 

0U Ueabund»«b»nandl5dp=. WhUe the gene is s«U present a. 17 dpc, the 

spleen, skeletal muscle, and testes. 

H-*— «»*>— *"« n*mA is present a, 7 dpc.whde -berets 

„^^to,->«*-™*-*»- 

^Uined by ft. to- to- of transcription. In contrast, pre - present * W 

» doping lung buds and gut, consistent w* to, adul, Northern prof,*. In 

a^glytransci^ in «1» condensing carulage of 11.5 and ope 
TJin-.even.alponion.f^^-^^' 5 ^^^ 

wide range of tissues tan endodermal. mesodermal and e«oderrnal or** 
supporting its fundamental role in embryonic development. 

vm. u m m nf Ihr Human nrrficne 

/u . \ o x in* nlaaues from a human lung cuts a iiui«j 
25 To isolate human ptc (hptc), 2 x 10> plaques no 

Piters were hybridized ovcnight a. rrfuced stringency (60 C m « SSC. 1 
dll, suJ. 5X D— 1% 0.2 mg/ml sonicatal salmon sperm DN^andO,* 

positive P^ues (HI and H2) were isolated, the inserts cloned tn» 
SDS). TV0P» P ^ ^contameds^uence highly similar to the 

30 pBluescnpt, and ur»n sequencing, do 

■ T„l m latetr*5 , eml,anaddinonal6itl<rP™l OBWCre 
„,o«septchomolog. To .sola* the 5 en ^.^^ 

screened in duplicate with M2-3 EcoR I and M2-3 Xho 1 (contai 
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sequence of mouse ptc) probes. Ten plaques were purified and of these, 6 inserts 
were subloned into pBluescript. To obtain the full coding sequence, H2 was fully 
and H14, H20, and H21 were partially sequenced. The 5.1kbp of human ptc 
sequence (SEQ ID NO:18) contains an open reading frame of 1447 amino acids 
(SEQ ID NO: 19) that is 96% identical and 98% similar to mouse ptc. The 5' and 3' 
untranslated sequences of human pre (SEQ ID NO: 18) are also highly similar to 
mouse ptc (SEQ ID NO:09) suggesting conserved regulatory sequence. 

K - Comparisnn of Moure Human. Fiv ™a p,.t| P r fi v fap lww 

The deduced mouse PTC protein sequence (SEQ ID NO: 10) has about 38% 
identical amino acids to fly PTC over about 1,200 amino acids. This amount of 
conservation is dispersed through much of the protein excepting the C-terminal 
region. The mouse protein also has a 50 amino acid insert relative to the fly 
protein. Based on the sequence conservation of PTC and the functional conservation 
15 of hedgehog between fly and mouse, one concludes that ptc functions sinularly in 
the two organisms. A comparison of the amino acid sequences of mouse (mptc) 
(SEQ ID NO: 10), human (hptc) (SEQ ID NO: 19), butterfly (bptc)(SEQ ID NO:4) 
and drosophila (ptc) (SEQ ID NO:6) is shown in Table 1. 

TART F. 1 
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alignment of human, mouse, fly, and butterfly PTC homologs 

alignment of human, mouse, fly, and butterfly ptc homologs 

U£?C ^2^ TODR -- < ^ S ^ I ^ GRPAGGG RRRRTGGLRRAAAPDR D yLHRPSyCDA 

ptc ^ AGN ^:::::-:-- ^grqagggrj^rtc^phra-apdrdylhrpsycda 

PTC M DRDSLPRVPDTHGD — WDE KLFSDL YI-RTSWVDA 

BPTC MVAPDSEAPSNPRITAAHES PCATEA RHSADL 

★ * * ** 



35 BP?C °S^ T I ^^^AIYIJ lS VFQ S H I iTLGSSVQKHAGKVa.FVAILVl^TFCVGLKS 
BPTC AIALSELEKGNIEGGRTSLWIRAWLQEQLFILGCFI^GDAGKVLFVAILVLSTFCVGLKS 

*• * *• .* * ** .*.*** *..*....* *•**. 



^™^ L ^ VTCRVSRE ^ TRQKICT ^™ p Q^WPKEE(»NVLTTEALLQH 

IQE ^ RI ^^^ QCTIGEDESATH OLLXQTTHDPNASVLHPCALLAH 

AQIHTRVrjQLWQEGGRLEAELKYTAQALGEADSSTHQLVIQTAKDPDVSLLHPGALLEH 
*..**.. ***. ** »* . .* ♦«* . ... . 



HPTC 



LDSALCASRVHVYMYNRQWKLEHLCyKSGELITET-GYMDQIIEY'LyPCLIITPLDCFWE 
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„„, * rK ppTj» WTNFDPLEFLEELK KINYQVDSWEEMLNKAEV 

< ^!rT^SSIIS----OTNFDPLEFLEELK KINVQVDSWEEMLNKAEV 

*. .* * . * * ** . •* • • • 

— AKTRKNDKTHRID-TTRQPLDPDVS 
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YHDSFVRVPHVIKNDNGGLPDFWLT J T.F$lTWT.r=MT.rkirT r*ne»T*v»r\^r»T 




J^S"™ G ^ N ^ raLVLT - raLWSDG IINQR^NYLSAWATNDVFAYGASQG 

• • •***• ***** *■** #* *•* * + ** ** + 

• • > • 

NIRPHRPEWVHDKADYMPETRIJIIPAAEPIEYAQFPWU*GIJIDTSDFVH^EKVRTICS 
OTRPTOPEWVHDKADYMPETRLRIPAAEPIEYAQFPFYLNGLRDTSDFVEAIEKVRVICN 

^^! RQYFHQPNEY DLKIPKSLPLVYAQMPFYLHGLTDTSQIKTLIGHIRDLSV 

NLKPQPQRWIHSPEDV HLEIKKSSPLIYTQLPFYLSGLSDTDSIKTLIRSVRDLGL 

*• * • * * . *. * * **** ** ** - 




?^^«n™^rff mVF ^ TSPFEFVIRHFOTLLLWL ^ < ^ NS I'I'VFPILLSMVG 
ESVIAPVVHGALAAALAASMLAASEFGFVARLFLRLLLALVFLGLIDGLLFFPIVLSILG 
• .* . .... * 

B^^f^ G ^ LWPSPEPPPSVVRFAMPPGHTHSGS DSSDSEySSQTTVSGI,SE-EL 
«Sy?^ GLNRLPTPSPEPPPSV VRFAVPPGHTNNGSDSSDSEYSSQTTVSGISE-EL 

PAAEVRPI EHPERLSTPS PKCS PI HPRKSSS SSGGGDKSSRTS --KSAPRPC APSL 

*.*.*. .*..♦*** * . . • , . 

o^^^ G ^ PMQ ^^ TENPVFAHSTVVHPESR «HPPSNPRQQPHLDSGSLPPGRQ 
^^2£^ PMWIVWTENPVFARSTVVHPDSRH Q pp WPRQQPHLDSGSLSPGRQ 

^tJp^^?? ssxqmp ^ Y0PREQ -- rp ^^ pp ^hkaaaqqhhqhqgpp? 

TTITEEPSSWHSSAHSVQSSMQSIWQPEVWETTTYNGSDSASGRSTPTKSSHGGAITT 

* * • ♦ * » 

• • • 

^o^^^ GLWPPLYRPR ^^ EISTEGHSGPSNRARWGPR GARSHNPRNPASTAMG 
^Son™ REG ^ PPPYRPRRDAFEIS TEGHSGPSNRDRSGPRGARSHNPRNPTSTAMG 
TPPPPFPTA YPPELQS I WQPEVTVETTHS DS 

TKVTATANIKVEVVTPSDRKSRRSYHYYDRRRDRDEDRDRDRERDRDRDRDRDRDRDRDR 

^^ G 3^^irZ^^^ AVHPPPVPGPGRNPRGGLCPGY PETDHGLFEDPHVP 

SSVPSYCQPITTVTASASVTVAVHPP — PGPGRNPRGGPCPGYESYPETDHGVFEDPHVP 

NT TKVT ATAN I KVE LAM P GRAVRS YNFTS 

DR DRERSRERDRRDRYRD ERDHRA SPRENGRDSGHE 

* * . 

FHVRCERRDSKVEVIELQDVECEERPRGSSSN 
FHVRCERRDSKVEVIELQDVECEERPWGSSSN 
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The identity of ten other clones recovered from the mouse library is not 
determined. These cDNAs cross-hybridize with mouse ptc sequence, while differing 
as to their restriction maps. These genes encode a family of proteins related to the 
5 patched protein. Alignment of the human and mouse nucleotide sequences, which 
includes coding and noncoding sequence, reveals 89% identity. 

In accordance with the subject invention, mammalian patched genes, including 
the mouse and human genes, are provided which allow for high level production of 

10 foe patched protein, which can serve many purposes. The patched protein may be 
used in a screening for agonists and antagonists, for isolation of its ligand, 
particularly hedgehog, more particularly Sonic hedgehog, and for assaying for the 
transcription of the mRNA ptc. The protein or fragments thereof may be used to 
produce antibodies specific for the protein or specific epitopes of the protein. In 

15 addition, the gene may be employed for investigating embryonic development, by 
screening fetal tissue, preparing transgenic animals to serve as models, and the like. 

All publications and patent applications cited in this specification are herein 
incorporated by reference as if each individual publication or patent application were 
20 specifically and individually indicated to be incorporated by reference. 

Although the foregoing invention has been described in some detail by way of 
illustration and example for purposes of clarity of understanding, it will be readily 
apparent to those of ordinary skill in the art in light of the teachings of this invention 
25 that certain changes and modifications may be made thereto without departing from 
the spirit or scope of the appended claims. 
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(1) GENERAL INFORMATION : 

^W^SST' ° F TRUSTEES ° F THE STANFORD JUNIOR 

(ii) TITLE OF INVENTION: Patched Genee and their U.e 
(ill) NUMBER OF SEQUENCES i 19 

(iv) CORRESPONDENCE ADDRESS I 

(A) ADDRESSEE* Flehr, Hohbach, Test, Albritton & Herbert 

(B) STREET: Four Embarcadero Center, Suite 3400 

(C) CITY: San Francisco 

(D) STATE: CA 

(E) COUNTRY: US 

(F) ZIP: 94111 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS-DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.30 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: PCT/US95 / 

(B) FILING DATE: 06-OCT-199S 

(C) CLASSIFICATION: 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Rowland, Bertram I 

(B) REGISTRATION NUMBER: 20015 

(C) REFERENCE/ DOCKET NUMBER: a60190-l 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 415*781-1989 

(B) TELEFAX: 415-398-3249 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 736 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDBDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(Xi) SEQUENCE DESCRIPTION : SEQ ID NO:l: 
AACNNCNNTN NATGGCACCC CCNCCCAACC TTTNNNCCNN NTAANCAAAA NNCCCCNTTT 60 
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NATACCCCCT NTAANANTTT TCCACCNNNC NNAAANNCCN CTGNANACNA NGNAAANCCN 120 

TTTTTNAACC CCCCCCACCC GGAATTCCNA NTNNCCNCCC CCAAATTACA ACTCCAGNCC 180 

AAAATTNANA NAATTGGTCC TAACCTAACC NATNGTTGTT ACGGTTTCCC CCCCCAAATA 240 

CATGCACTGG CCCGAACACT TGATCGTTGC CGTTCCAATA AGAATAAATC TGGTCATATT 300 

AAACAAGCCN AAAGCTTTAC AAACTGTTGT ACAATTAATG GGCGAACACG AACTGTTCGA 360 

ATTCTGGTCT GGACATTACA AAGTGCACCA CATCGGATGG AACCAGGAGA AGGCCACAAC 420 

CGTACTGAAC GCCTGGCAGA AGAAGTTCGC ACAGGTTGGT GGTTGGCGCA AGGAGTAGAG 480 

TGAATGGTGG TAATTTTTGG TTGTTCCAGG AGGTGGATCG TCTGACGAAG AGCAAGAAGT 540 

CGTCGAATTA CATCTTCGTG ACGTTCTCCA CCGCCAATTT GAACAAGATG TTGAAGGAGG 600 

CGTCGAANAC GGACGTGGTG AAGCTGGGGG TGGTGCTGGG GGTGGCGGCG GTGTACGGGT 660 

GGGTGGCCCA GTCGGGGCTG GCTGCCTTGG GAGTGCTGGT CTTNGCGNGC TNCNATTCGC 720 

CCTATAGTNA GNCGTA 736 
(2} INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 107 amino acids 

(B) TYPE: amino acid 

(C) STRAND ED NESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Xaa Pro Pro Pro Asn Tyr Asn Ser Xaa Pro Lys Xaa Xaa Xaa Leu Val 
15 10 15 

Leu Thr Pro Xaa Val Val Thr Val Ser Pro Pro Lys Tyr Met His Trp 
20 25 30 

Pro Glu His Leu He Val Ala Val Pro He Arg He Asn Leu Val He 
35 40 45 

Leu Asn Lys Pro Lys Ala Leu Gin Thr Val Val Gin Leu Met Gly Glu 
50 55 60 

His Glu Leu Phe Glu Phe Trp Ser Gly His Tyr Lye Val His His He 
65 70 75 80 

Gly Trp Asn Gin Glu Lys Ala Thr Thr Val Leu Asn Ala Trp Gin Lys 
85 90 95 

Lys Phe Ala Gin Val Gly Gly Trp Arg Lys Glu 
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"0 105 
(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 5187 base paira 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cONA 



(xi) SEQUENCE DESCRIPTION t SEQ ID NO: 3: 
GGCTCTGTCA CCCGGAGCCG GAGTCCCCGG CGCCCAOCAG CGTCCTCGCG AGCCGAGCCC 
CCAGGCGCGC CCGGAGCCCG CGGCGGCGGC CGCAACATCC CCTCCGCTGG TAACGCCGCC 
GGGGCCCTCG GCAGGCAGGC CGCCGGCGGG AGGCGCAGAC GGACCGGGGG ACCGCACCGC 
GCCGCCCCCG ACCGGGACTA TCTCCACCGG CCCAGCTACT GCGACGCCGC CTTCGCTCTG 
GAGCAGATTT CCAAGGGGAA GGCTACTGGC CGGAAAGCGC CGCTGTGGCT GAGACCGAAG 
TTTCAGAGAC TCTTATTTAA ACTGGGTTCT TACATTCAAA AGAACTGCCG CAAGTTTTTG 
GTTGTGGGTC TCCTCATATT TGGGGCCTTC GCTGTGGGAT TAAAGGCAGC TAATCTCGAG 
ACCAACGTGG AGGAGCTGTG GGTGGAAGTT GGTGGACGAG TGAGTCGAGA ATTAAATTAT 
ACCCGTCAGA AGATAGGAGA AGAGGCTATG TTTAATCCTC AACTCATGAT ACAGACTCCA 
AAAGAAGAAG GCGCTAATGT TCTGACCACA GAGGCTCTCC TGCAACACCT GGACTCAGCA 
CTCCACGCCA GTCGTGTGCA CGTCTACATG TATAACAGGC AATGGAACTT GGAACATTTG 
TGCTACAAAT CAGGGGAACT TATCACGGAG ACAGGTTACA TGGATCAGAT AATAGAATAC 
CTTTACCCTT GCTTAATCAT TACACCTTTG GACTGCTTCT GGGAAGGGGC AAAGCTACAG 
TCCGGGACAG CATACCTCCT AGGTAAGCCT CCTTTACGGT GGACAAACTT TGACCCCTTG 
GAATTCCTAG AAGAGTTAAA GAAAATAAAC TACCAAGTGG ACAGCTGGGA CGAAATGCTG 
AATAAAGCCG AAGTTGGCCA TGGGTACATG GACCGCCCTT GCCTCAACCC AGCCCACCCA 
GATTGCCCTG CCACAGCCCC TAACAAAAAT TCAACCAAAC CTCTTGATGT GGCCCTTGTT 
TTGAATGGTG GATGTCAAGG TTTATCCACG AAGTATATGC ATTGGCAGGA GCAGTTGATT 
GTGGGTGGTA CCGTCAAGAA TGCCACTGCA AAACTTGTCA GCGCTCACCC CCTCCAAACC 
ATGTTCCAGT TAATGACTCC CAAGCAAATG TATGAACACT TCAGGGGCTA CGACTATGTC 
TCTCACATCA ACTGGAATCA AGACAGGGCA GCCGCCATCC TGCACCCCTG CCAGAGGACT 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
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TACGTGGAGG 
ACAACCACGA 
GCCAGCGGCT 
TCCAAGTCCC 
GCAGGATTGG 
TTGCCGTTTC 
AGTGAAACAG 
CGCACCGGAG 
GCATTGATCC 
TTCAATTTTG 
CGTGAGGACA 
ATTCAAGTTG 
CCCCCATACA 
CAGCTCCGCA 
TCTGAGATCT 
GAGAGCACCA 
CTCGAGCCCC 
TTCCTCCTGA 
GTCAGCCTTT 
CGGGAAACCA 
ATGTATATAG 
CATAAGAGTT 
ATGTGGCTGC 
TGGGAAACTG 
GCTTACAAAC 
ACTAAACAGC 
CTGACCGCTT 
CCTCACCGGC 
ATCCCAGCAG 



TGGTTCATCA 
CCCTGGACGA 
ACCTACTGAT 
AGGGTGCCGT 
GCCTCTGCTC 
TTGCTCTTGG 
GACAGAATAA 
CCAGCGTGGC 
CTATCCCTGC 
CTATGGTTCT 
GAAGATTGGA 
AGCCACAGGC 
CCAGCCACAG 
CAGAGTATGA 
CTGTACAGCC 
GCTCTACCAG 
CCTGCACCAA 
AACCCAAAGC 
ATGGGACCAC 
GAGAATATGA 
TCACCCAGAA 
TCAGCAATGT 
ACTACTTTAG 
GGAGGATCAT 
TCCTGGTGCA 
GTCTGGTAGA 
GGGTCAGCAA 
CGGAGTGGGT 
CAGAGCCCAT 



AAGTGTCGCC 
CATCCTAAAA 
GCTTGCCTAT 
GGGGCTGGCT 
CTTGATTGGC 
TGTTGGTGTG 
GAGGATTCCA 
CCTCACCTCC 
CCTGCGAGCG 
GCTCATTTTT 
TATTTTCTGC 
CTACACAGAG 
CTTCGCCCAC 
CCCTCACACG 
TGTTACCGTC 
GGACCTGCTC 
GTGGACACTC 
CAAGGTTGTG 
CCGAGTGAGA 
CTTCATAGCT 
AGCAGACTAC 
GAAGTATGTC 
AGACTGGCTT 
GCCAAACAAT 
GACTGGCAGC 
CGCAGATGGC 
CGACCCTGTA 
CCATGACAAA 
CGAGTACGCT 



CCAAACTCCA 
TCCTTCTCTG 
GCCTGTTTAA 
GGCGTCCTGT 
ATTTCTTTTA 
GATGATGTCT 
TTTGAGGACA 
ATCAGCAATG 
TTCTCCCTCC 
CCTGCAATTC 
TGTTTCACAA 
CCTCACAGTA 
GAAACCCATA 
CACGTGTACT 
ACCCAGGACA 
TCCCAGTTCT 
TCTTCGTTTG 
GTAATCCTTC 
GACGGGCTGG 
GCCCAGTTCA 
CCGAATATCC 
ATGCTGGAGG 
CAAGGACTTC 
TATAAAAATG 
CGAGACAAGC 
ATCATTAATC 
GCTTACGCTG 
GCCGACTACA 
CAGTTCCCTT 



CTCAAAAGGT 
ATGTCAGTGT 
CCATGCTGCG 
TGGTTGCGCT 
ATGCTGCGAC 
TCCTCCTGGC 
GGACTGGGGA 
TCACCCCCTT 
AGGCTGCTGT 
TCAGCATGGA 
GCCCCTGTGT 
ACACCCGGTA 
TCACTATGCA 
ACACCACCGC 
ACCTCAGCTG 
CAGACTCCAG 
CAGAGAAGCA 
TTTTCCTGGG 
ACCTCACGGA 
AGTACTTCTC 
AGCACCTACT 
AGAACAAGCA 
AGGATGCATT 
GATCAGATGA 
CCATCGACAT 
CGAGCGCTTT 
CCTCCCAGGC 
TGCCAGAGAC 
TCTACCTCAA 



GCTTCCCTTC 
CATCCGAGTG 
CTGGGACTGC 
GTCAGTGGCT 
AACTCAGGTT 
CCATGCATTC 
GTGCCTCAAG 
CTTCATGGCC 
GGTGGTGGTA 
TTTATACAGA 
CAGCAGGGTG 
CAGCCCCCCA 
GTCCACCGTT 
CGAGCCACGC 
TCAGAGTCCC 
CCTCCACTGC 
CTATGCTCCT 
CTTGCTGGGG 
CATTGTTCCC 
TTTCTACAAC 
TTACGACCTT 
ACTTCCCCAA 
TGACAGTGAC 
CGGGGTCCTC 
TAGTCAGTTG 
CTACATCTAC 
CAACATCCGG 
CACGCTGAGA 
CGGCCTACGA 



1320 

1380 

1440 

1500 

1560 

1620 

1680 

1740 

1800 

1860 

1920 

1980 

2040 

2100 

2160 

2220 

2280 

2340 

2400 

2460 

2520 

2580 

2640 

2700 

2760 

2820 

2880 

2940 

3000 
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6ACACCTCAO 
AGCCTGGGAC 
AGCCTGCGCC 
TCCGCAGTCT 
ATGACCGTTG 
GTGGTCATCC 
GCCTTTCTGA 
TTTGCTCCCG 
TCCCAATTTG 
GGGGTTCTCA 
GAGCTGTCTC 
AGTGTCGTCC 
TCGGAGTACA 
CCACAGCAGG 
GTCTTTGCCC 
CGGCAACAGC 
CGAAGGGATC 
TTTCAAATTT 
GGGGCCCGTT 
AGCTACTGCC 
CCCCCGCCTG 
CCTGACACTG 
AGGAGGGACT 
TGGGGGAGCA 
AAGCCCCGCC 
GGCAGTTCAT 
AARAGGTGTA 
CCACTCCTGC 
TGTGCCACAA 



ACTTTOTOOA 
TGTCCAGCTA 
ACTGGCTGCT 
TCCTCCTCAA 
AGCTCTTTGG 
TGATTGCATC 
CAGCCATTGG 
TTCTGGACGG 
ATTTCATTGT 
ATGGACTGGT 
CAGCCAATGG 
GGTTTGCCGT 
GCTCTCAGAC 
GTGCCGGAGG 
CGTCCACTGT 
CCCACCTGGA 
CCCCTACAGA 
C7ACTGAAGG 
CTCACAACCC 
AGCCCATCAC 
GACCTGGGCG 
ATCACGGGGT 
CAAAGGTGGA 
GCTCCAACTG 
CCCACCTCTT 
TGTTACTGTA 
CACATGTAAT 
CCCAGAGTGG 
CCAAGCTTAA 



AGCCATAGAA 
CCCCAATGGC 
GCTATCCATC 
CCCCTGGACG 
CATGATGGGC 
TGTTGGCATC 
GGACAAGAAC 
TGCTGTGTCC 
CAGATACTTC 
TCTGCTGCCT 
CCTAAACCGA 
GCCTCCTGGT 
CACGGTGTCT 
CCCTGCCCAC 
GGTCCATCCG 
CTCTGGCTCC 
AGGCTTGCGG 
GCATTCTGGC 
TCGGAACCCA 
CACTGTGACG 
CAACCCCCGA 
ATTTGAGGAT 
GGTCATAGAG 
AGCGTAATTA 
TCCAGAACTG 
ACTGATTGTA 
ATACATGGAA 
GGAGACGACA 
CTTAGTTTTA 



AAAGTGAGAG 
TACCCCTTCC 
ACCGTGGTGC 
GCCGGGATCA 
CTCATTGGGA 
GGAGTGGAGT 
CACAGGGCTA 
ACTCTGCTGG 
TTTGCCGTCC 
GTCCTCTTAT 
CTGCCCACTC 
CACACGAACA 
GGCATCAGTG 
CAAGTGATTG 
GACTCCAGAC 
TTGTCCCCTG 
CCACCCCCCT 
CCTAGCAATA 
AOCTCCACCG 
GCTTCTGCTT 
GGGGGGCCCT 
CCTCATGTGC 
CTACAGGACG 
AAATCTGAAG 
CTTGAAGAGA 
TTATTKKGTG 
ATGCTGTACA 
GCGGCCCTTT 
AAAAAAATCT 



TCATCTGTAA 
TGTTCTGGGA 
TGGCCTGCAC 
TTGTCATGGT 
TCAAGCTGAG 
TCACCGTCCA 
TCCTCGCTCT 
GTGTACTGAT 
TGCCCATTCT 
CCTTCTTTGG 
CTTCGCCTGA 
ATGGGTCTGA 
AGGAGCTCAG 
TGGAAGCCAC 
ATCAGCCTCC 
GACGGCAAGG 
ACAGACCGCG 
GGGACCGCTC 
CCATGGGCAG 
CGGTGACTGT 
GTCCACGCTA 
CTTTTCATGT 
TGGAATGTGA 
CAAAGAGGCC 
ACTGCTTGCA 
AAATATTTCT 
GTCTATTTCC 
CCCCTGTGTA 
CCCAGCATAT 



CAACTATACG 
GCAATACATC 
GTTTCTAGTG 
CCTGGCTCTG 
TGCTCTGCCT 
CGTGGCTTTG 
GGAACACATG 
GCTTGCAGGG 
CACCGTCTTG 
ACCGTGTCCT 
GCCGCCTCCA 
TTCCTCCGAC 
GCAATACGAA 
AGAAAACCCT 
CTTGACCCCT 
CCAGCAGCCT 
CAGAGACGCT 
AGGGCCCCGT 
CTCTGTGCCC 
TGCTGTGCAT 
TGAGAGCTAC 
CAGGTGTGAG 
GGAGAGGCCG 
AAAGATTGGA 
ATTATGGGAA 
ATAAATATTT 
TGGGGCCTCT 
CATTGGTCTC 
GTCGCTGCTG 



3060 

3120 

3180 

3240 

3300 

3360 

3420 

3480 

3540 

3600 

3660 

3720 

3780 

3840 

3900 

3960 

4020 

4080 

4140 

4200 

4260 

4320 

4380 

4440 

4500 

4560 

4620 

4680 

4740 
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CTTAAATATT GTATAATTTA CTTGTATAAT TCTATCCAAA TATTGCTTAT 


GTAATAGGAT 


4800 


TATTTCTAAA GGTTTCTGTT TAAAATATTT TAAATTTGCA TATCACAACC 


CTGTGGTAGG 


4860 


ATGAATTGTT ACTGTTAACT TTTGAACACG CTATGCGTGG TAATTGTTTA 


ACGAGCAGAC 


4920 


ATGAAGAAAA CAGGTTAATC CCAGTGCCTT CTCTACGCGT AGTTGTATAT 


GGTTCGCATG 


4980 


GGTGGATGTG TGTGTGCATG TGACTTTCCA ATGTACTGTA TTGTGGTTTG 


TTGTTGTTGT 


5040 


TGCTGTTGTT GTTCATTTTG GTGTTTTTGG TTGCTTTGTA TGATCTTAGC TCTGGCCTAG 


5100 


GTGGGCTGCG AAGGTCCAGG TCTTTTTCTG TCGTGATGCT GGTGGAAAGG 


TGACCCCAAT 


5160 


CATCTGTCCT ATTCTCTGGG ACTATTC 




5187 


(2) INFORMATION FOR SEQ ID NOi4: 







(i) SEQUENCE CHARACTERISTICS s 

(A) LENGTH: 1311 amino acids 

(B) TYPE: amino acid 

(C) STRANDED NESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:4: 

Met Val Ala Pro Asp Ser Glu Ala Pro Ser Asn Pro Arg He The Ala 
! 5 10 15 

Ala His Glu Ser Pro Cys Ala Thr Glu Ala Arg His Ser Ala Asp Leu 
20 25 30 

Tyr lie Arg Thr Ser Trp Val Asp Ala Ala Leu Ala Leu Ser Glu Leu 
35 4° 45 

Glu Lys Gly Asn He Glu Gly Giy Arg Thr Ser Leu Trp He Arg Ala 
SO 55 60 

Trp Leu Gin Glu Gin Leu Phe He Leu Gly Cys Phe Leu Gin Gly Asp 
65 7° 75 

Ala Gly Lys Val Leu Phe Val Ala He Leu Val Leu Ser Thr Phe Cys 



85 



90 



Val Gly Leu Lye Ser Ala Gin He His Thr Arg Val Asp Gin Leu Trp 
100 105 110 

val Gin Glu Oly Gly Arg Leu Glu Ala Glu Leu Lys Tyr Thr Ala Gin 
11S 120 125 

Ala Leu Gly Glu Ala Asp Ser Ser Thr His Gin Leu Val He Gin Thr 
130 135 1*0 
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145 ^ A " P ^ HI ^ Hl8 P " C1 * *™ * « 

155 160 

His Leu Ly. val V.1 His Ala Ala Thr Arg Val Thr Val Hia Met ^ 



170 



175 



A.p lie Clu Trp Arg Leu x. y . Aop ^ ^ ^ ^ ^ ^ ^ ^ 



185 



190 



A.p Phe Clu Cly Tyr Hi . Hi . clu ne Afln v ^ ^ e 

200 205 



»~ cy. »i. a. „. Thr _ ^ ^ phe oi)i ^ 

215 220 
Leu Leu Cly Pro A.p Tyr Pro He Tyr Val Pro Hi. Leu Ly. Hi. Ly. 

235 240 
Leu Cln Trp Thr Hi. Leu A.n Pro Leu Clu Val Val Clu Clu Val Ly. 

250 2S5 
Ly. L.„ Ly. jj. „„ , h . Pto Ihr IU Hu Au ^ ^ 

265 27Q 

»r, »u oi, tl . s . r sl . K . t Lya Lye pro ^ lm pc(> 

280 285 

Thr Asp Pro His Cys Pro Ala Thr Al» Prn a«„ r 

290 ,oc A8n LyB L y fl Ser °lY His 

300 

lie Pro A.p val Ala Ala Clu Leu Ser Hi. Cly Cy. Tyr Cly Phe Ala 

310 3 " 320 

Ala Ala Tyr Met Hi. Trp Pro Clu Cln Leu He Val Cly cly Ala Thr 

330 335 
Arg A.„ ser Thr Ser Ala Leu Arg Ly. Ala Arg Xaa Leu Cln Thr Val 

345 350 
V! «„ £ „« Cly olu ^ Mt ^ ^ ^ a>[> ^ 

360 365 

Tyr g. V.l „i. CI- U. Jjj Itp ».„ „„ „„ ly , A1 , Al . ^ 

375 380 
Leu Aep Ala Trp Cln Arg Ly. Phe Ala Ala Clu Val Arg Ly. He Thr 

390 395 400 

Thr Ser Cly Ser V.l Ser Ser Ala Tyr Ser Phe Tyr Pro Phe Ser Thr 

410 415 

Ser Thr Leu A.n A.p n e Leu Cly Ly. Phe Ser Clu Val Ser Leu Ly. 

425 430 

Asn He He Leu Gly Tvr Mat: vh u~- , 

435 M6t t-u Ile Val Ala Val Thr 

440 445 

Leu He Gin Trp Arg A.p p r n e Arg Ser Gin Ala Cly Val Cly n 
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450 «55 «0 

Ala Gly Val Leu Leu Leu Ser lie Thr Val Ala Ala Gly Leu Gly Phe 

470 475 



465 

Cys Ala Leu Leu Gly lie Pro Phe Asn Ala Ser Ser Thr Gin lie Val 
' 485 490 



Pro Phe Leu Ala Leu Gly Leu Gly Val Gin Asp Met Phe Leu Leu Thr 
500 505 510 

His Thr Tyr Val Glu Gin Ala Gly Asp Val Pro Arg Glu Glu Arg Thr 
515 520 5 " 

Gly Leu Val Leu Lys Lye Ser Gly Leu Ser Val Leu Leu Ala Ser Leu 
530 535 540 

Cys Asn Val Met Ala Phe Leu Ala Ala Ala Leu Leu Pro lie Pro Ala 
545 550 S55 

Phe Arg Val Phe Cys Leu Gin Ala Ala lie Leu Leu Leu Phe Asn Leu 

Gly Ser He Leu Leu Val Ph. Pro Ala Met lie Ser Leu Asp Leu Arg 
580 585 

Arg Arg Ser Ala Ala Arg Ala Asp Leu Leu Cys Cys Leu Met Pro Glu 
595 600 605 

Pro Lys Lys Lys lie Pro Glu Arg Ala Lys Thr Arg Lys 



Ser Pro Leu 
610 



615 62° 



Asp Val ser Glu Asn Val Thr Lys Thr Cys Cys Leu Ser Val Ser Leu 



Asn Asp Lys Thr Hi- Arg lie Asp Thr Thr Arg Gin Pro Leu Asp Pro 
625 630 635 

v«i ser Glu Asn Val Thr Lys Thr 

645 650 

Thr Lys Trp Ala Lys A.n Gin Tyr Ala Pro Phe lie Met Arg Pro Ala 

660 665 

Val Lys Val Thr Ser Met Leu Ala Leu He Ala Val lie Leu Thr Ser 
675 680 685 

Gly Ala Thr Lys Val Lys Asp Gly Leu Asp Leu Thr Asp He 



Val Trp 
690 



695 700 



Val Pro Glu Asn Thr Asp Glu His Glu Phe Leu Ser Arg Gin Glu Lys 
705 710 715 

Phe civ Phe Tyr Asn Met Tyr Ala 

72S 730 

t.«o l.u Leu Tvr Glu Tyr His Asp 

750 



Tyr Phe Gly Phe Tyr Asn Met Tyr Ala Val Thr Gin Gly Asn Phe Glu 



Tyr Pro Thr Asn Gin Lys Leu Leu Tyr Glu Tyr His Asp Gin Phe Val 
740 7*5 7 50 

Arg lie Pro Asn He He Lys Asn Asp Asn Gly Gly Leu Thr Lys Phe 
755 760 765 
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Trp Leu ser Leu Pho Arg Asp Trp Leu Leu Asp Leu Cln Val Ala Phe 

770 775 780 

Asp Lys Glu Val Ala Ser Gly Cya lie Thr Gin Glu Tyr Trp Cye Lye 
. 790 795 800 

Asn Ala ser A.p Glu Gly He Leu Ala Tyr Lye Leu Met Val Gin Thr 
805 810 815 

Gly Hi. Val Asp A.n Pro He Aep Lya Ser Leu lie Thr Ala Gly His 
820 825 830 

Arg Leu Val Aep Lye A»p Gly He lie Aen Pro Lys Ala Phe Tyr Aen 
835 840 845 

Tyr Leu Ser Ala Trp Ala Thr Asn Asp Ala Leu Ala Tyr Gly Ala Ser 
850 855 860 

Gin Gly Asn Leu Lys Pro Gin Pro Gin Arg Trp He His Ser Pro Glu 

870 875 880 

Asp Val His Leu Glu lie Lys Lys Ser Ser Pro Leu He Tyr Thr Gin 



885 890 



895 



Leu Pro Phe Tyr Leu Ser Gly Leu Ser Asp Thr Xaa Ser He Lys Thr 
900 905 910 

Leu He Arg Ser Val Arg Asp Leu Cys Leu Lys Tyr Glu Ala Lys Gly 

920 925 

Leu Pro Asn Phe Pro Ser Gly He Pro Phe Leu Phe Trp Glu Gin Tyr 



940 



Leu Tyr Leu Arg Thr Ser Leu Lei 
9*5 950 



>u Leu Ala Leu Ala Cya Ala Leu Ala 
955 9 6 o 



Ala Val Phe He Ala Val Met Val Leu Leu Leu Asn Ala Trp Ala Ala 
965 970 975 

Val Leu Val Thr Leu Ala Leu Ala Thr Leu Val Leu Gin Leu Leu Gly 
980 985 99Q 

Val Met Ala Leu Leu Gly Val Lys Leu Ser Ala Met Pro Ala Val Leu 
995 1000 10 05 

Leu Val Leu Ala He Gly Arg Gly Val His Phe Thr Val His Leu Cye 
1010 1015 1020 

Leu Gly Phe val Thr Ser lie Gly cys Lys Arg Arg Arg Ala Ser Leu 
1025 1030 1035 1040 

Ala Leu Glu Ser Val Leu Ala Pro Val Val His Gly Ala Leu Ala Ala 
1045 1050 10S5 

Ala Leu Ala Ala^Ser Met Leu Ala Ala Ser Glu Cye Gly Phe Val Ala 
1060 1065 1070 

Arg Leu Phe Leu Arg Leu Leu Leu Asp lie Val Phe Leu Gly Leu II 
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1075 



1080 1085 



Aeo Cly Leu Leu Phe Phe Pro lie Val Leu Ser lie L u Gly Pro Ma 
1090 1095 1100 

Ala Glu val Arg Pro lie Glu His Pro Glu Arg Leu Ser Thr Pro Ser 



1105 



1110 "15 1120 



Pro Lys eye Ser Pro lie His Pro Arg Lye Ser Ser Ser Ser Ser Gly 
H25 1130 H35 

Gly Gly Asp Lye Ser Ser Arg Thr Ser Lys Ser Ala Pro Arg Pro Cys 
1140 1145 1150 

Ala Pro Ser Leu Thr Thr lie Thr Glu Glu Pro Ser Ser Trp Hia Ser 
1155 1160 H65 

ser Ala His Ser Val Gin Ser Ser Met Gin Ser lie Val Val Gin Pro 
1170 1"5 H80 

Glu Val Val Val Glu Thr Thr Thr Tyr Asn Gly Ser Asp Ser Ala Ser 
1 185 U90 "95 "CO 

Gly Arg Ser Thr Pro Thr Lys Ser Ser His Gly Gly Ala lie Thr Thr 
1205 1210 I 215 

Thr Lys Val Thr Ala Thr Ala Asn lie Lys Val Glu Val Val Thr Pro 
1220 1225 1230 

Ser Asp Arg Lys Ser Arg Arg Ser Tyr His Tyr Tyr Asp Arg Arg Arg 
1235 1240 1245 

Asp Arg Asp Glu Asp Arg Asp Arg Asp Arg Glu Arg Asp Arg Asp Arg 
1250 1255 1260 

Asp Arg Asp Arg Asp Arg Asp Arg Asp Arg Asp Arg Asp Arg Asp Arg 
1265 1270 1275 "BO 

Glu Arg Ser Arg Glu Arg Asp Arg Arg Asp Arg Tyr Arg Asp Glu Arg 
1285 1290 1295 

Asp His Arg Ala Ser Pro Arg Glu Lys Arg Gin Arg Phe Trp Thr 
1300 1305 1310 

(2) INFORMATION FOR SEQ ID HO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4434 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:5: 
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CGAAACAAGA GAGCGAGTCA CACTACGCAO AGCCTCTGTG TTCTCTGTTG AGTGTOGCCC 60 

ACGCACACAO CCCCAAAACA GTCCACACAG ACGCCCGCTG GGCAAGAGAC AGTGAGAGAG 120 

AGAAACAGCG CCGCGCGCTC GCCTAATGAA GTTGTTCGCC TGGCIGGCGT GCCGCATCCA 180 

CGAGATACAG ATACATCTCT CATGGACCGC GACACCCTCC CAOCCGTTCC GGACACACAC 240 

GCCGATCTGG TCGATCAGAA ATTATTCTOG GATCTTTACA TACCCACCAG CTGGGTGGAC 300 

GCCCAAOTGC CCCTCGATCA GATAGATAAG GGCAAAGCGC GTGGCAGCCG CAOGGOGATC 360 

TATCTGCCAT CAOTATTCCA GTCCCACCTC GAAACCCTOO GCAGCTCCGT GCAAAAGCAC 420 

GCGCGCAAGO TGCTATTOGT GGCTATCCTO GTCCTGAGCA CCTTCTGCOT CGGCCTGAAG 480 

AGCGCCCACA TCCACTCCAA GCTGCACCAO CTCTGGATCC AGGAGGGCGG CCGGCTGGAG 540 

GCGGAACTGG CCTACACACA GAAGACOATC GCCGAGGACG AGTCGGCCAC GCATCAGCTG 600 

CTCATTCAGA OGACCCACGA CCCGAACCCC TCCGTCCTCC ATCCGCAGGC GCTGCTTGCC 660 

CACCTGGAGG TCCTGGTCAA GCCCACCGCC GTCAAGGTGC ACCTCTAOGA CACCGAATGG 720 

GGGCTGCGCG ACATGTGCAA CATGCCGAGC AOGCCCTCCT TCGAGGGCAT CTACTACATC 780 

GAGCAGATCC TCCGCCACCT CATTCOGTGC TCGATCATCA CGCCGCTGGA CTGTTTCTCG 840 

GAGGGAAGCC AGCTGTTGGG TCOGCAATCA GCGGTCGTTA TACCAGCCCT CAACCAACGA 900 

CTCCTGTGGA CCACCCTGAA TCCCGCCTCT GTCATGCAGT ATATGAAACA AAAGATGTCC 960 

GAGGAAAAGA TCAGCTTCGA CTTCGAGACC GTCGAGCAGT ACATCAAGCG TGCGGCCATT 1020 

GGCAGTGGCT ACATGGAGAA GCCCTGCCTG AACCCACTCA ATCCCAATTG CCCGGACACG 1080 

GCACCGAACA AGAACAGCAC CCAGCCCCCG GATGTGGCAG CCATCCTGTC OGGAGGCTGC 1140 

TACGGTTATG CCGCGAAGCA CATGCACTGG CCGGAGGAGC TGATTGTGGG CGGAOGGAAG 1200 

AGGAACCGCA GCGGACACTT GAGGAAGGCC CAGGCCCTGC AGTCGGTGGT GCAGCTGATC 1260 

ACCGAGAAGG AAATGTACGA CCAGTGGCAG GACAACTACA AGCTGCACCA TCTTGGATGG 1320 

AOGCACGAGA AGGCAGCGGA GGTTTTGAAC GCCTGGCACC GCAACTTTTC GCGGGAGGTG 1380 

GAACAGCTGC TAOGTAAACA GTOGAGAATT CCCACCAACT ACGATATCTA CGTGTTCAGC 1440 

TCGGCTGCAC TGGATGACAT CCTGGCCAAG TTCTCCCATC CCAGCGCCTT GTCCATTGTC 150O 

ATCCGOGTGG CCGTCACCGT TTTGTATGCC TTTTCCACGC TCCTOCGCTG GAGGGACCCC 1560 

CTCCGTGGCC AGAGCAGTGT GGGOGTGGCC GGAGTTCTGC TCATGTGCTT CAGTACOGCC 1620 

GCCGGATTGG GATTGTCAGC CCTGCTCGG, ATCGTTTTCA ATGCGCTGAC CCCTGCCTAT 1680 

GCGGAGAGCA ATCGCCGCCA GCAGACCAAG CTGATTCTCA AGAACGCCAG CACCCAGCTG 1740 
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GTTCCGTTTT 
CTGTTCAGTG 
GCTTTGAAGG 
CTATTGGTTT 
GACATCTTCT 
CTGCCGCTGA 
AGGGTGCCGC 
AGTCACTCAC 
CTCATGCGCA 
AGCTTGTATG 
GACAGCAACG 
TATGCGGTTA 
CATGATTCCT 
TTCTGGCTGC 
TACCGCGACG 
CTGGCCTACA 
GTGCTCACCA 
TATCTGTCGG 
TATCCGGAAC 
AGTCTGCCAT 
CAGATCAAGA 
CTGCCCAACT 
TCCTCACTGG 
CTCCTGCTCT 
CAGATCTTTG 
CTCATCCTCA 
ACATCCGTTG 
CTTGTCCACG 
GAGTTTGTGA 



TGGCCCTTGG 
CCTGCAGCAC 
TATTCTGTCT 
TTCCGGCCAT 
GCTGCTGTTT 
ACAACAACAA 
TGCCCGCCCA 
TGGCGTCCTT 
GCTGGGTGAA 
CCTCCACGCG 
AGCACAAGTT 
CCCAGGGCAA 
TTGTCCGGGT 
TGCTCTTCAG 
GACGGCTGAC 
AGCTAATCGT 
ATCGCCTGGT 
CATGGGCCAC 
CGCGCCAGTA 
TGGTCTACGC 
CCCTGATAGG 
ATCCATCGGG 
CCATGATCCT 
CCGTTTGGGC 
GGGCCATGAC 
GCGTGGGCAT 
GCAACCGACA 
GCATGCTGAC 
TCCGGCACTT 



TCTGGOCGTC 
CGCAGGATCC 
GCAGGCTGCC 
GATTTCGTTG 
TCCGGTGTGG 
CGGGCGCGGG 
GAATCCTCTG 
CTCCCTGGCA 
GTTCCTGACC 
CCTTCAGGAT 
CCTGGATGCT 
CTTTGAATAT 
GCCACATGTG 
CGAGTGGCTG 
CAAGGAGTGC 
GCAAACCGGC 
CAACAGCGAT 
CAACGACGTC 

ttttcaccaa 

TCAGATGCCC 
TCATATTCGC 
CATTCCCTTC 
GGCCTGCGTG 
CGCCGTTCTC 
TCTGCTGGGC 
GATGCTGTGC 
GCGCCGCGTC 
CTCCGGAGTG 
CTGCTGGCTT 



GATCACATCT 
TTCTTTGCGG 
ATCGTAATGT 
GATCTACGCA 
AAGGAACAGC 
GCCCGGCATC 
CTGGAACAGA 
ACCTTCGCCT 
GTTATGGGTT 
GGCCTGGACA 
CAAACTCGGC 
CCCACCCAGC 
ATCAAGAATG 
GGTAATCTGC 
TGGTTCCCAA 
CATGTGGACA 
GGCATCATCA 
TTCGCCTACG 
CCCAACGAGT 
TTTTACCTCC 
GACCTGAGCG 
ATCTTCTGGG 
CTACTCGCCG 
GTGATCCTCA 
ATCAAACTCT 
TTCAATGTGC 
CAGCTGAGCA 
GCCGTGTTCA 
CTGCTGGTGG 



TCATAGTGGG 
CCGCCTTTAT 
GCTCCAATTT 
GACGTACCGC 
CGAAGGTCGC 
CGAAGAGCTG 
GGGCAGACAT 
TTCAGCACTA 
TCCTGGCGGC 
TTATTGATCT 
TCTTTGGCTT 
AGCAGTTGCT 
ATAACGGTGG 
AAAAGATATT 
ACGCCAGCAG 
ACCCCGTGGA 
ACCAACGCGC 
GAGCTTCTCA 
ACGATCTTAA 
ACGGACTAAC 
TCAAGTACGA 
AGCAGTACAT 
CCCTGGTGCT 
GCGTTCTGGC 
CGGCCATTCC 
TGATATCACT 
TGCAGATGTC 
TGCTCTCCAC 
TCTTATGCGT 



ACCGAGCATC 
TCOGGTGCCG 
GGCAGCGCCT 
CGGCAGGGCG 
ACCTCCGGTG 
CAACAACAAC 
CCCTGGGAGC 
CACTCCCTTC 
CCTCATATCC 
GGTGCCCAAG 
CTACAGCATG 
CAGGGACTAC 
ACTGCCGGAC 
CGACGAGGAA 
CGATGCCATC 
CAAGGAACTG 
CTTCTACAAC 
GGGCAAATTG 
GATACCCAAG 
AGATACCTCG 
GGGCTTCGGC 
GACCCTGCGC 
GGTCTCCCTG 
CTCGCTGGCC 
GGCAGTCATA 
GGGCTTCATG 
CCTGGGACCA 
GTCGCCCTTT 
TGGCGCCTGC 



1800 
1860 
1920 
1980 
2040 
2100 
2160 
2220 
2280 
2340 
2400 
2460 
2520 
2580 
2640 
2700 
2760 
2820 
2880 
2940 
3000 
3060 
3120 
3180 
3240 
3300 
3360 
3420 
3460 
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AACAGCCTTT TCGTGTTCCC CATCCTACTG AGCATGGTGG CACCGGAGCC GGAGCTGGTG 

CCGCTGCAGC ATCCAGACCG CATATCCACG CCCTCTCCGC TGCCCGTGCG CAGCAGCAAG 

AGATCGGCCA AATCCTATGT GGTGCACGGA TCGCGATCCT CGCGAGGCAC CTGCCAGAAG 

TCGCATCACC ACCACCACAA ACACCTTAAT CATCCATCCC TGACGACGAT CACCGACCAG 

CCGCAGTCGT GGAAGTCCAC CAACTCGTCC ATCCAGATGC CCAATGATTG GACCTACCAG 

CCGCGGGAAC AGCGACCCGC CTCCTACCCG CCCCCGCCCC CCGCCTATCA CAAGCCCGCC 

GCCCAGCACC ACCACCACCA TCAGGGCCCG CCCACAACGC CCCCGCCTCC CTTCCCCACG 

GCCTATCCGC CGGAGCTCCA GAGCATCGTG GTGCAGCCGG ACGTGACGGT GGAGAOGACG 

CACTCGGACA GCAACACCAC CAAGGTGACG GCCACGGCCA ACATCAAGGT GGAGCTGGCC 

ATGCCCGGCA GGGCGGTGCG CAGCTATAAC TTTACCACTT AGCACTAGCA CTAGTTCCTG 

TAGCTATTAG GACGTATCTT TAGACTCTAG CCTAAGCCGT AACCCTATTT GTATCTGTAA 

AATCGATTTG TCCAGCGGGT CTGCTGAGGA TTTCGTTCTC ATGGATTCTC ATGGATTCTC 

ATGGATGCTT AAATGGCATG GTAATTGGCA AAATATCAAT TTTTGTGTCT CAAAAAGATC 

CATTACCTTA TCGTTTCAAG ATACATTTTT AAAGAGTCCG CCAGATATTT ATATAAAAAA 

AATCCAAAAT CGACGTATCC ATGAAAATTG AAAAGCTAAG CACACCCGTA TGTATGTATA 

TGTGTATGCA TGTTAGTTAA TTTCCCGAAG TCCGGTATTT ATAGCAGCTG CCTT 

(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 1285 amino acids 
(8) TYPE: amino acid 

(C) STRAND ED NESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



3540 

3600 

3660 

3720 

3780 

3840 

3900 

3960 

4020 

4080 

4140 

4200 

4260 

4320 

4380 

4434 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

Met Asp Arg Asp Ser Leu Pro Arg Val Pro Asp Thr His Gly Asp Val 
1 5 10 15 

Val Asp Glu Lys Leu Phe Ser Asp Leu Tyr He Arg Thr Ser Trp Val 
20 25 3 0 

Asp Ala Gin Val Ala Leu Asp Gin lie Asp Lys Gly Lys Ala Arg Gly 



40 



45 



Ser Arg Thr Ala He Tyr Leu Arg Ser Val Phe Gin Ser His Leu Glu 
50 5 5 60 
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Thr Leu Gly Ser Ser Val Gin Ly. Hi- Ala Gly Lys Val Leu Phe Val 



65 



70 



Ala lie Leu Val Leu Ser Thr Phe Cys Val Gly Leu Lys Ser Ala Gin 
85 90 

II. Hi. Ser Ly. Val Hi. Gin Leu Trp lie Gin Glu Gly Gly Arg Leu 

100 10b 

Glu Ala Glu Leu Ala Tyr Thr Gin Ly. Thr He Gly Glu A.p Glu Ser 

120 A * s 



115 



Ala Thr Hi. Gin Leu Leu lie Gin Thr Thr Hi. A.p Pro Asn Ala Ser 



130 



135 1*0 



Val Leu Hi. Pro Gin Ala Leu Leu Ala His Leu Glu Val Leu Val Ly. 



145 



150 



160 



Ala Thr Ala Val Ly. Val Hi. Leu Tyr Asp Thr Glu Trp Gly Leu Arg 

165 170 

Asp Met cy. A.n Met Pro Ser Thr Pro Ser Phe Glu Gly lie Tyr Tyr 

180 185 
He Glu Gin He Leu Arg Hi. Leu He Pro Cy. Ser lie He Thr Pro 

L eu A. P Cy. Phe Trp Glu Gly Ser Gin Leu Leu Gly Pro Glu Ser Ala 

210 215 
Val Val He Pro Gly Leu A.n Gin Arg Leu Leu Trp Thr Thr Leu A.n 
225 23° " 5 

Pro Ala Ser Val Met Gin Tyr Met Ly. Gin Ly. Met Ser Glu Glu Ly. 

245 250 

He Ser Phe Asp Phe Glu Thr Val Glu Gin Tyr Met Lys Arg Ala Ala 
260 265 



lie Gly ser Gly Tyr Met Glu Ly. Pro Cy. Leu A.n Pro Leu Aan Pro 

275 280 
Asn Cy. Pro A.p Thr Ala Pro A.n Ly. A.n Ser Thr Gin Pro Pro A. P 

290 295 
Val Gly Ala He Leu Ser Gly Gly Cy. Tyr Gly Tyr Ala Ala Ly. Hi. 
30S 310 31 

Met Hi. Trp Pro Glu Glu Leu He Val Gly Gly Arg Ly. Arg A.n Arg 



325 



ser Gly Hi. Leu Arg Ly. Ala Gin Ala Leu Gin ser Val Val Gin Leu 
Met Thr Glu Lys Glu Met Tyr A. P Gin Trp Gin Aep Asn Tyr Ly. Val 



355 



360 365 



His Hi. Leu Gly Trp Thr Gin Glu Ly. Ala Ala Glu Val Leu A.n Ala 
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370 375 

J7S 380 

Trp cin Arg Aon Phe Ser Arg Glu Val Glu n« t« t 
38s 39() * uxu vai Gin Leu Leu Arg Lys Gin 



395 



400 



Ser Arg lie Ala Thr Asn Tyr Asp He Tyr Val Phe Ser Ser Ala Ala 



410 



415 



I*u A.p Asp lie Leu Ala Lye Phe Ser Hie Pro Ser Ala Leu Ser lie 

425 430 
Val He Gly Val Ala Val Thr Val Leu Tyr Ala Phe Cy. Thr Leu Leu 

445 

Arg Trp Arg Asp Pro Val Arg Gly Gl„ ser Ser Val Gly Val Ala Gly 

Val Leu Leu Met Cye Phe Ser Thr Ala Ala Oly Le U G ly Leu Ser Ala 

47s 480 
Leu Leu Gly II. val Phe A.n Ala Leu Thr Ala Ala Tyr Ala Glu ser 

485 490 49S 

Asn Arg Arg Glu Gin Thr Lye Leu lie Leu Lye Aen Ala Ser Thr Gin 

505 510 
Val Val Pro Phe Leu Ala Leu Gly Leu Gly Val Asp His He Phe He 



520 



525 



Val Gly Pro Ser He Leu Phe Ser Ala Cys Ser Thr Ala Gly ser Phe 



540 

Phe Ala Ala Ala Phe lie Pro Val Pro Ala Leu Lys Val Phe Cys Leu 

550 "5 560 

Cln Ala Al. He Val Met Cy. Ser Asn Leu Ala Ala Ala Leu Leu Val 

570 57s 

Ph. P„ „. £t 11% s.r L~ ».p l.„ ta , Arg ^ nr Wd ^ 

585 590 
Ala Asp lie Phe Cye Cya Cys Phe Pro Val Trp Lye Glu Gin Pro Lys 

600 605 

Val Al. Pro Pro Val Leu Pro Leu Asn Asn Asn Asn Gly Arg Gly Ala 

" 5 620 

Arc Hie Pro Lya Ser Cye Asn Asn Asn Arg Val Pro Leu Pro Ala Gin 

630 "5 640 

Asn Pro Leu Leu Glu Gin Arg Ala Asp He Pro Gly Ser Ser His Ser 

645 65 <> 655 

Leu Al. ser Phe Ser Leu Ala Thr Phe Ala Phe Gin His Tyr Thr Pro 

665 670 
Ph Leu Met Arg Ser Trp Val Lys Phe Leu Thr Val Met Gly Ph Leu 



680 



685 
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c , ser Leu Tyr Ala Ser Thr Arg Leu Gin Asp Gly 
Ala Ala Leu lie Ser Ser Leu Tyr ^ 

690 



,u a w - « l « w S "° " l " * 

710 

t>k<» riv Phe Tyr Ser Met Tyr Ala Val 
Ala Gin Thr Arg Leu Phe Gly Phe Ty ^ 



Leu Asp — - 
705 



Leu Asp Ala n» *»* 730 
725 

olu Tyr Pro Thr Gin Gin Gin Leu Leu Arg Asp 
Thr Gin Gly Asn Phe Olu Tyr *r ^ 750 
740 

•h. Ara Val Pro His Val He Lys Asn Asp Asn Gly 
Tyr His Asp Ser Phe Arg Val Pro ?65 

755 

nVl . c_ r Qiu Trp Leu Gly Asn 
Gly Leu Pro Asp Phe Trp Leu Leu Leu Phe Ser Glu 

770 

- «. .v. - -s 01 " G1 " ™ " 9 - ° ly ~ s 

785 _ „ 

» *i« ser Ser Asp Ala lie Leu Ala Tyr Lys 
Glu cys Trp Phe Pro Asn Ala Ser Ser * p a» 

805 

«. riv His Val Asp Asn Pro Val Asp Lys Glu Leu 
Leu He Val Gin Thr Gly His Val a p M0 
820 

« uu « « - - »"o ~ - 1U - *~ " 

835 

_ er Ala Trp Ala Thr Asn Asp Val Phe Ala 
Ala Phe Tyr Asn Tyr Leu Ser Ala Trp ^ 

850 855 



nc oly s., «. «, ,v. - « - ~ « "» - K 

B65 S7U 

al . „ «n JJJ « - - "J « - ~ K 

« * - - ~ «• - ^ S " u oly Ihc S IBr ~ 

900 

01n „. g. - ~ »• «v S n. -p - S - - - 

olu 01y Z - ~ s - ~ ~ °» 5iS "° PM ~ 

930 

Trp „ lu «. Ty r - - M u s., « J- »• - - - »; 

945 

C v., « - -J »■ - « l - S ~ « 
« „ »u ». Ti - v.. a. -j ~ v.x - - ~ - « »• 

980 *° 
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995 1000 



1005 



Pro Ala Val He Leu He Leu Ser Val Gly Met M t Leu Cya Phe Asn 
1010 1015 1020 

Val Leu He Ser Leu Cly Phe Met Thr Ser Val Gly Aen Arg Gin Aro 
1025 1030 1035 1040 

Arg Val Gin Leu Ser Met Gin Met Ser Leu Gly Pro Leu Val His Gly 
1045 loso 1055 

Met Leu Thr Ser Gly Val Ala Val Phe Met Leu Ser Thr Ser Pro Phe 
1060 1065 1070 

Glu Phe Val He Arg His Phe Cya Trp Leu Leu Leu Val Val Leu Cya 
1° 75 1080 1085 

Val Gly Ala Cya Asn Ser Leu Leu Val Phe Pro He Leu Leu Ser Met 
1090 1095 iioo 

Val Gly Pro Glu Ala Glu Leu Val Pro Leu Glu His Pro Asp Aro He 
1105 IHO 1115 1120 

Ser Thr Pro Ser Pro Leu Pro Val Arg Ser Ser Lya Arg Ser Gly Lva 
"25 U30 Has 7 

Ser Tyr Val Val Gin Gly Ser Arg Ser Ser Arg Gly Ser Cya Gin Lya 
H*0 H45 H50 

Ser His His His His His Lys Asp Leu Aen Asp Pro Ser Leu Thr Thr 
USS 1160 ii6s 

He Thr Glu Glu Pro Gin Ser Trp Lya Ser Ser Asn Ser Ser He Gin 
H70 1175 H80 

Met Pro Asn Aap Trp Thr Tyr Gin Pro Arg Glu Gin Arg Pro Ala Ser 
1185 "90 1195 1200 

Tyr Ala Ala Pro Pro Pro Ala Tyr Hia Lya Ala Ala Ala Gin Gin Hie 
"OS 1210 12 i5 

Hia Gin Hia Gin Gly Pro Pro Thr Thr Pro Pro Pro Pro Phe Pro Thr 
1220 1225 1230 

Ala Tyr Pro Pro Glu Leu Gin Ser He Val Val Gin Pro Glu Val Thr 
1235 1240 1245 

Val Glu Thr Thr His Ser Asp Ser Asn Thr Thr Lya Val Thr Ala Thr 
1250 1255 1260 

Ala Aan He Lya Val Glu Leu Ala Met Pro Gly Arg Ala Val Arg Ser 
1265 1270 1275 i 28 0 

Tyr Aan Phe Thr Ser 
1285 

(2) INFORMATION FOR SEQ ID N0:7» 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 345 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : lin ar 

ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:7: 
AAGGTCCATC AGCTTTGGAT ACAGGAAGGT GGTTCGCTCG AGCATGAGCT AGCCTACACG 
CAGAAATCGC TCGGCGAGAT GGACTCCTCC ACGCACCAGC TGCTAATCCA AACNCCCAAA 
GATATGGACG CCTCGATACT CCACCCGAAC GCGCTACTGA CCCACCTGGA CGTGGTGAAG 
AAAGCGATCT CGGTGACGGT GCACATGTAC GACATCACGT GGAGNCTCAA GGACATGTGC 
TACTCGCCCA GCATACCGAG NTTCGATACG CACTTTATCG AGCAGATCTT CGAGAACATC 
ATACCGTGCG CGATCATCAC GCCGCTGGAT TGCTTTTGGG AGGGA 
(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 115 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



60 
120 
180 
240 
300 
345 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

Lys Val His Gin Leu Trp He Gin Glu Gly Gly Ser Leu Glu His Glu 

Leu Ala Tyr Thr Gin Lys Ser Leu Gly Glu Met Asp Ser Ser Thr His 
20 25 

Gin Leu Leu lie Gin Thr Pro Lys Asp Met Asp Ala Ser lie Leu His 
35 40 * 9 

Pro Asn Ala Leu Leu Thr His Leu Asp Val Val Lys Lys Ala lie Ser 
SO 5S 60 

Val Thr Val His Met Tyr Asp He Thr Trp Xaa Leu Lys Asp Met Cys 
65 70 75 

Tyr Ser Pro Ser lie Pro Xaa Ph Asp Thr His Phe He Glu Gin lie 



85 
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Phe Glu Aen lie lie Pr Cya Ala He He Thr Pro Leu Asp Cys Phe 
100 105 110 

Trp Glu Gly 
115 

(2) INFORMATION FOR SEQ 10 NO: 9: 

(1) SEQUENCE CHARACTERISTICS I 

(A) LENGTH: 5187 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEONESSi single 

(D) TOPOLOGY: linear 

(il) MOLECULE TYPE: CONA 



(xi) SEQUENCE DESCRIPTION: SEQ 10 NO: 9: 

GGGTCTGTCA CCCGGAGCCG GACTCCCCGG CGGCCAGCAG CGTCCTCGCG ACCCGAGCGC 60 

CCAGGCGCGC CCGGAGCCCG CGGCGGCGCC CGCAACATGG CCTCGGCTGG TAACGCCGCC 120 

GGGGCCCTGG GCAGGCAGGC CGGCGGCGGG AGGCGCACAC GGACCGGGGG ACCGCACCGC 180 

GCCGCCCCGC ACCGGGACTA TCTGCACCGG CCCAGCTACT GCGACGCCGC CTTCGCTCTG 240 

GAGCAGATTT CCAAGGCGAA GGCTACTCGC CGGAAAGCGC CGCTGTGGCT GAGAGCGAAG 300 

TTTCAGAGAC TCTTATTTAA ACTCGGTTGT TACATTCAAA AGAACTGCGG CAAGTTTTTG 360 

GTTCTGCCTC TCCTCATATT TGGGGCCTTC GCTGTGGGAT TAAAGGCAGC TAATCTCGAG 420 

ACCAACGTGG AGGAGCTGTG GGTGGAAGTT GGTGGACGAG TGAGTCGAGA ATTAAATTAT 480 

ACCCGTCAGA AGATAGGAGA AGAGGCTATG TTTAATCCTC AACTCATGAT ACAGACTCCA 540 

AAAGAAGAAG GCGCTAATGT TCTGACCACA GACCCTCTCC TCCAACACCT GGACTCAGCA 600 

CTCCAGGCCA GTCGTCTGCA CGTCTACATG TATAACAGGC AATCGAAGTT GGAACATTTG 660 

TGCTACAAAT CAGGGGAACT TATCACGGAG ACAGGTTACA TGCATCAGAT AATAGAATAC 720 

CTTTACCCTT GCTTAATCAT TACACCTTTC GACTGCTTCT GGGAAGGGGC AAAGCTACAG 780 

TCCGGGACAG CATACCTCCT AGGTAAGCCT CCTTTACGGT GGACAAACTT TGACCCCTTG 840 

GAATTCCTAG AAGAGTTAAA GAAAATAAAC TACCAAGTGG ACAGCTGGGA GGAAATGCTG 900 

AATAAAGCCG AAGTTGGCCA TGGGTACATG GACCGGCCTT GCCTCAACCC AGCCGACCCA 960 

GATTGCCCTG CCACAGCCCC TAACAAAAAT TCAACCAAAC CTCTTGATGT GGCCCTTGTT 1020 

TTGAATGGTG GATGTCAAGG TTTATCCAGG AAGTATATCC ATTGGCAGCA GGAGTTGATT 1080 

GTCCGTCGTA CCGTCAAGAA TCCCACTGGA AAACTTGTCA GCGCTCACGC CCTCCAAACC 1140 
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ATGTTCCAGT 
TCTCACATCA 
TACGTGGAGG 
ACAACCACGA 
GCCAGCGGCT 
TCCAAGTCCC 
GCAGGATTGG 
TTGCCGTTTC 
AGTGAAACAG 
CGCACCGGAG 
GCATTGATCC 
TTCAATTTTG 
CGTGAGGACA 
ATTCAAGTTG 
CCCCCATACA 
CAGCTCCGCA 
TCTGAGATCT 
GAGAGCACCA 
CTCGAGCCCC 
TTCCTCCTGA 
GTCAGCCTTT 
CGGGAAACCA 
ATGTATATAG 
CATAAGACTT 
ATGTGGCTGC 
TGGGAAACTG 
GCTTACAAAC 
ACTAAACAGC 
CTGACCGCTT 



TAATGACTCC 
ACTGGAATGA 
TGGTTCATCA 
CCCTGGACGA 
ACCTACTGAT 
AGGGTGCCGT 
GCCTCTGCTC 
TTGCTCTTGG 
GACAGAATAA 
CCAGCGTGGC 
CTATCCCTGC 
CTATGGTTCT 
GAAGATTGCA 
AGCCACAGGC 
CCAGCCACAG 
CAGAGTATGA 
CTGTACAGCC 
GCTCTACCAG 
CCTGCACCAA 
AACCCAAAGC 
ATGGGACCAC 
GAGAATATGA 
TCACCCAGAA 
TCAGCAATGT 
ACTACTTTAG 
GGAGGATCAT 
TCCTGGTGCA 
GTCTGGTAGA 
GGGTCAGCAA 



CAAGCAAATG TATGAACACT 
AGACAGGGCA GCCGCCATCC 
AAGTGTCGCC CCAAACTCCA 
CATCCTAAAA TCCTTCTCTG 
GCTTGCCTAT GCCTGTTTAA 
GGGGCTGGCT GGCGTCCTGT 
CTTGATTGGC ATTTCTTTTA 
TGTTGGTGTG GATGATGTCT 
GAGGATTCCA TTTGAGGACA 
CCTCACCTCC ATCAGCAATG 
CCTGCGAGCG TTCTCCCTCC 
GCTCATTTTT CCTGCAATTC 
TATTTTCTGC TGTTTCACAA 
CTACACAGAG CCTCACAGTA 
CTTCGCCCAC GAAACCCATA 
CCCTCACACG CACGTGTACT 
TGTTACCGTC ACCCAGGACA 
GGACCTGCTC TCCCAGTTCT 
GTGGACACTC TCTTCGTTTG 
CAAGGTTGTG GTAATCCTTC 
CCGAGTGAGA GACGGGCTGG 
CTTCATAGCT GCCCAGTTCA 
AGCAGACTAC CCGAATATCC 
GAAGTATGTC ATGCTGGAGG 
AGACTGGCTT CAAGGACTTC 
GCCAAACAAT TATAAAAATG 
GACTGGCAGC OGAGACAAGC 
CGCAGATGGC ATCATTAATC 
CGACCCTGTA GCTTACGCTG 
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TCAGGGGCTA CGACTATGTC 
TGGAGGCCTG CCAGAGGACT 
CTCAAAAGGT GCTTCCCTTC 
ATGTCAGTGT CATCCGAGTG 
CCATGCTGCG CTGGGACTGC 
TGGTTGCGCT GTCAGTGGCT 
ATGCTGCGAC AACTCAGGTT 
TCCTCCTGGC CCATGCATTC 
GGACTGGGGA GTGCCTCAAG 
TCACCGCCTT CTTCATGGCC 
AGGCTGCTGT GGTGGTGGTA 
TCAGCATGGA TTTATACAGA 
GCCCCTGTGT CAGCAGGGTG 
ACACCCGGTA CAGCCCCCCA 
TCACTATGCA GTCCACCGTT 
ACACCACCGC CGAGCCACGC 
ACCTCAGCTG TCAGAGTCCC 
CAGACTCCAG CCTCCACTGC 
CAGAGAAGCA CTATGCTCCT 
TTTTCCTGGG CTTGCTGGGG 
ACCTCACGGA CATTGTTCCC 
AGTACTTCTC TTTCTACAAC 
AGCACCTACT TTACGACCTT 
AGAACAAGCA ACTTCCCCAA 
AGGATGCATT TGACAGTGAC 
GATCAGATGA CGGGGTCCTC 
CCATCGACAT TAGTCAGTTG 
CGAGCGCTTT CTACATCTAC 
CCTCCCAGGC CAACATCCGG 



1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
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CCTCACCGGC CGCAGTCGGT CCATCACAAA GCCOACTACA TGCCACACAC CACCCTCACA 
ATCCCACCAG CAGAGCCCAT CGAGTACCCT CACTTCCCTT TCTACCTCAA CCCCCTACGA 
GACACCTCAG ACTTTGTGGA AGCCATACAA AAACTGACAG TCATCTGTAA CAACTATACG 
AGCCTCGCAC TCTCCACCTA CCCCAATGGC TACCCCTTCC TGTTCTGGGA GCAATACATC 
AGCCTGOGCC ACTGGCTCCT GCTATCCATC AGCGTGGTGC TGGCCTOCAC GTTTCTAGTG 
TCCCCAGTCT TCCTCCTGAA CCCCTGCACG GCOGGCATCA TTGTCATGCT CCTGCCTCTG 
ATGACCGTTG AGCTCTTTGG CATGATGGGC CTCATTGGGA TCAAGCTCAG TGCTGTGCCT 
GTGGTCATCC TGATTGCATC TGTTGGCATC GGAGTGGAGT TCACCCTCCA CGTGGCTTTG 
GCCTTTCTCA CAGCCATTGG GGACAAGAAC CACAGGCCTA TGCTCGCTCT GGAACACATG 
TTTGCTCCCG TTCTGGACCG TGCTGTGTCC ACTCTGCTGG CTCTACTGAT GCTTGCAGGG 
TCCGAATTTG ATTTCATTGT CAGATACTTC TTTGCCGTCC TGGCCATTCT CACCGTCTTG 
GGGGTTCTCA ATGGACTGGT TCTCCTCCCT GTCCTCTTAT CCTTCTTTGG ACCGTGTCCT 
GAGGTGTCTC CAGCCAATGG CCTAAACCGA CTCCCCACTC CTTCGCCTGA GCCGCCTCCA 
AGTGTCGTCC GCTTTCCCGT GCCTCCTGGT CACACGAACA ATGCGTCTGA TTCCTCCGAC 
TCGGAGTACA GCTCTCAGAC CACGCTGTCT GGCATCAGTG AGGAGCTCAG GCAATACGAA 
GCACAGCAGG GTGCCGGAGG CCCTGCCCAC CAAGTGATTG TGGAAGCCAC AGAAAACCCT 
CTCTTTGCCC CGTCCACTGT GGTCCATCCG GACTCCAGAC ATCAGCCTCC CTTGACCCCT 
CGGCAACAGC CCCACCTGGA CTCTGGCTCC TTGTCCCCTC GACCCCAAGC CCAGCACCCT 
CGAAGCGATC CCCCTAGAGA AGCCTTGCGG CCACCCCCCT ACAGACCCCG CACAGACGCT 
TTTGAAATTT CTACTGAAGG CCATTCTGGC CCTAGCAATA GCGACCGCTC AGGGCCCCGT 
GGGGCCCGTT CTCACAACCC TCGGAACCCA ACGTCCACCC CCATGGGCAG CTCTGTGCCC 
AGCTACTGCC AGCCCATCAC CACTGTGACG GCTTCTGCTT CGGTGACTCT TGCTGTGCAT 
CCCCCGCCTG GACCTGGGCG CAACCCOOGA OCOGGGCCCT GTCCAGGCTA TCACAGCTAC 
CCTGAGACTG ATCACGGCGT ATTTGAGGAT CCTCATGTGC CTTTTCATGT CAGGTGTGAG 
AGGAGGGACT CAAACGTGCA GGTCATAGAC CTACAGGACC TCGAATGTCA GGAGAGGCCG 
TGGGGGAGCA CCTCCAACTC AGGGTAATTA AAATCTGAAG CAAAGACGCC AAAGATTCGA 
AAGCCCCGCC CCCACCTCTT TCCAGAACTG CTTGAAGAGA ACTGCTTGGA ATTATGCGAA 
GGCAGTTCAT TGTTACTGTA ACTGATTGTA TTATTKKGTG AAATATTTCT ATAAATATTT 
AARAGGTGTA CACATCTAAT ATACATGGAA ATGCTGTACA GTCTATTTCC TCGCCCCTCT 
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CCACTCCTGC CCCAGACTGG GGAGACCACA GGGCCCCTTT CCCCTGTCTA CATTGGTCTC 
TGTGCCACAA CCAAGCTTAA CTTAGTTTTA AAAAAAATCT CCCAGCATAT GTCGCTGCTG 
CTTAAATATT GTATAATTTA CTTGTATAAT TCTATGCAAA TATTGCTTAT GTAATAGGAT 
TATTTGTAAA GGTTTCTGTT TAAAATATTT TAAATTTGCA TATCACAACC CTGTGGTAGG 
ATGAATTGTT ACTGTTAACT TTTGAACACG CTATGCGTGG TAATTGTTTA ACGAGCAGAC 
ATGAAGAAAA CAGGTTAATC CCAGTGGCTT CTCTAGGGGT AGTTGTATAT GGTTCGCATG 
GGTGGATGTG TGTGTGCATG TGACTTTCCA ATGTACTGTA TTGTGGTTTG TTGTTGTTGT 
TGCTGTTCTT GTTCATTTTG GTGTTTTTGG TTGCTTTGTA TGATCTTAGC TCTGGCCTAG 
GTGGGCTGGG AAGGTCCAGG TCTTTTTCTG TCGTGATGCT GGTGGAAAGG TGACCCCAAT 
CATCTGTCCT ATTCTCTGGG ACTATTC 
(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 1434 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
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<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

Met Ala Ser Ala Gly Aen Ala Ala Gly Ala Leu Gly Arg Gin Ala Gly 



1 



Gly Gly Arg Arg Arg Arg Thr Gly Gly Pro Hie Arg Ala Ala Pro Asp 

20 25 
Arg Asp Tyr Leu His Arg Pro Ser Tyr Cys Asp Ala Ala Phe Ala Leu 

Olu Gin lie Ser Lye Gly Lys Ala Thr Gly Arg Lys Ala Pro Leu Tr P 

50 55 60 

Leu Arg Ala Lys Phe Gin Arg Leu Leu Phe Lye Leu Gly Cys Tyr lie 
65 70 75 

, t«« &an Cvs Glv Lys Phe Leu Val 

90 



Gin Lys Asn Cys Gly Lys Phe Leu Val Val Gly Leu Leu lie Phe Gly 

85 90 

Ala Phe Ala Val Gly Leu Lys Ala Ala Asn Leu Glu Thr Asn Val Glu 
Glu Leu Trp Val Glu Val Gly Gly Arg val ser Arg Glu Leu Asn Tyr 
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Thr Arg Gin Lys lie Gly Glu Glu Ala Mot Phe Asn Pro Gin Leu Met 
130 135 140 

He Gin Thr Pro Lye Glu Glu Gly Ala Asn Val Leu Thr Thr Glu Ala 
145 150 155 160 

Leu Leu Gin Hie Leu Asp Ser Ala Leu Gin Ala Ser Arg Val Hie Val 
165 170 175 

Tyr Met Tyr Aen Arg Gin Trp Lye Leu Glu Hie Leu Cys Tyr Lya Ser 
180 185 190 

Gly Glu Leu He Thr Glu Thr Gly Tyr Met Asp Gin He He Glu Tyr 
195 200 205 

Leu Tyr Pro Cya Leu He He Thr Pro Leu Aap cya Phe Trp Glu Gly 
210 215 220 

Ala Lya Leu Gin Ser Gly Thr Ala Tyr Leu Leu Gly Lya Pro Pro Leu 
225 230 235 2 40 

Arg Trp Thr Aan Phe Aap Pro Leu Glu Phe Leu Glu Glu Leu Lye Lya 
245 250 255 

He Aan Tyr Gin Val Aap Ser Trp Glu Glu Met Leu Asn Lys Ala Glu 
260 265 270 

Val Gly His Gly Tyr Met Asp Arg Pro Cya Leu Aan Pro Ala Aap Pro 
275 280 285 

Aap Cya Pro Ala Thr Ala Pro Aan Lya Aan Ser Thr Lya Pro Leu Aap 
290 295 300 

Val Ala Leu Val Leu Aan Gly Gly Cya Gin Gly Leu Ser Arg Lys Tvr 
305 310 315 320 

Met Hie Trp Gin Glu Glu Leu He Val Gly Gly Thr Val Lya Aan Ala 
325 330 335 

Thr Gly Lya Leu Val Ser Ala His Ala Leu Gin Thr Met Phe Gin Leu 
340 345 350 

Met Thr Pro Lye Gin Met Tyr Glu Hia Phe Arg Gly Tyr Aap Tyr Val 
355 360 365 

Ser Hie He Aan Trp Aan Glu Aap Arg Ala Ala Ala He Leu Glu Ala 
370 375 380 

Trp Gin Arg Thr Tyr Val Glu Val Val Hia Gin Ser Val Ala Pro Aan 
385 390 395 400 

Ser Thr Cln Lye Val Leu Pro Phe Thr Thr Thr Thr Leu Aap Aap He 
405 410 415 



Leu Lys Ser Phe Ser Aap Val Ser Val He Arg Val Ala Ser Gly Tyr 
420 425 430 

Leu Leu Met Leu Ala Tyr Ala Cya Leu Thr Met Leu Arg Trp Aap Cys 
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435 440 445 

Ser Lys Ser Gin Gly Ala Val Gly Leu Ala Gly Val Leu Leu Val Ala 
450 455 460 

Leu ser Val Ala Ala Gly Leu Gly L u Cys Ser Leu He Gly He Ser 
465 470 475 480 

Phe Asn Ala Ala Thr Thr Gin Val Leu Pro Phe Leu Ala Leu Gly Val 
485 490 495 

Gly Val Asp Asp Val Phe Leu Leu Ala His Ala Phe Ser Glu Thr Gly 
500 505 510 

Gin Aen Lys Arg lie Pro Phe Glu Asp Arg Thr Gly Glu Cys Leu Lys 
515 520 525 

Arg Thr Gly Ala Ser Val Ala Leu Thr Ser He Ser Asn Val Thr Ala 
530 535 540 

Phe Phe Met Ala Ala Leu He Pro He Pro Ala Leu Arg Ala Phe Ser 
545 550 555 560 

Leu Gin Ala Ala Val Val Val Val Phe Asn Phe Ala Met Val Leu Leu 
565 570 575 

He Phe Pro Ala He Leu Ser Met Asp Leu Tyr Arg Arg Glu Asp Arg 
580 585 590 

Arg Leu Asp He Phe Cys Cys Phe Thr Ser Pro Cys Val Ser Arg Val 
595 600 605 

He Gin Val Glu Pro Gin Ala Tyr Thr Glu Pro His Ser Asn Thr Arg 
610 615 620 

Tyr Ser Pro Pro Pro Pro Tyr Thr Ser His Ser Phe Ala His Glu Thr 
625 630 635 640 

His He Thr Met Gin Ser Thr Val Gin Leu Arg Thr Glu Tyr Asp Pro 
645 650 655 

His Thr His Val Tyr Tyr Thr Thr Ala Glu Pro Arg Ser Glu He Ser 
660 665 670 

Val Gin Pro Val Thr Val Thr Gin Asp Asn Leu Ser Cys Gin Ser Pro 
675 680 685 

Glu Ser Thr Ser Ser Thr Arg Asp Leu Leu Ser Gin Phe Ser Asp Ser 
690 695 700 

Ser Leu His Cys Leu Glu Pro Pro Cys Thr Lys Trp Thr Leu Ser Ser 
70S 710 715 720 

Phe Ala Glu Lys His Tyr Ala Pro Phe Leu Leu Lys Pro Lys Ala Lys 
725 730 735 

Val Val Val II Leu Leu Phe Leu Gly Leu Leu Gly Val S r Leu Tyr 
740 745 750 
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Gly Thr Thr Arg Val Arg Asp Gly Leu Asp Leu Thr Asp lie Val Pro 
755 760 765 

Arg Glu Thr Arg Glu Tyr Asp Phe lie Ala Ala Gin Phe Lys Tyr Phe 

770 775 

Ser Ph. Tyr Asn Met Tyr II. Val Thr Gin Lys Ala Asp Tyr Pro Aen 



785 790 795 



800 



lie Cln Hi. Leu Leu Tyr Asp Leu His Lya Ser Phe Ser Aen Val Lye 
805 810 815 

Tyr Val Met Leu Glu Glu Asn Lys Gin Leu Pro Gin Met Trp Leu His 
820 825 830 

Tyr Phe Arg Asp Trp Leu Cln Gly Leu Gin Asp Ala Phe Asp Ser Asp 
835 840 845 

Trp Glu Thr Gly Arg lie Met Pro Asn Asn Tyr Lys Asn Gly Ser Asp 
850 855 860 

Asp Gly val Leu Ala Tyr Lys Leu Leu Val Gin Thr Gly Ser Arg Asp 
865 870 875 * a8 J 

Lys Pro lie Asp lie ser Gin Leu Thr Lys Gin Arg Leu Val Asp Ala 
885 890 895 

Asp Gly lie lie Asn Pro Ser Ala Phe Tyr lie Tyr Leu Thr Ala Trp 
900 90S 910 F 

Val Ser Asn Asp Pro Val Ala Tyr Ala Ala Ser Gin Ala Asn lie Arg 
915 920 925 

Pro His Arg Pro Glu Trp Val His Asp Lys Ala Aep Tyr Met Pro Glu 
930 935 940 

Thr Arg Leu Arg lie Pro Ala Ala Glu Pro He Glu Tyr Ala Gin Phe 
945 950 955 960 

Pro Phe Tyr Leu Asn Gly Leu Arg Asp Thr Ser Asp Phe Val Glu Ala 
965 970 975 

lie Glu Lys Val Arg Val lie Cys Asn Asn Tyr Thr Ser Leu Gly Leu 
980 gas 990 

Ser Ser Tyr Pro Asn Gly Tyr Pro Phe Leu Phe Trp Glu Cln Tyr He 
995 1000 1005 

^n** 9 Hi " TrP LeU Leu Ser Ile Ser Val Val Leu Ala Cys 
1010 1015 1020 

Thr Phe Leu Val Cys Ala Val Phe Leu Leu Asn Pro Trp Thr Ala Gly 
1025 1030 1035 ioio 

11 lie Val Met Val Leu Ala Leu Met Thr Val Glu Leu Ph Gly Met 
1045 1050 10 5 5 

Met Gly Leu Ile Gly II Lys Leu Ser Ala Val Pro Val Val Ile Leu 
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1060 



1065 1070 



lie Ala Ser Val Gly He Gly Val Glu Phe Thr Val Hie Val Ala Leu 
1075 1080 1085 

Ala Phe Leu Thr Ala lie Gly Asp Lys Asn Hie Arg Ala Met Leu Ala 
1090 1095 1100 

Leu Glu His Met Phe Ala Pro Val Leu Asp Gly Ala Val Ser Thr Leu 
1105 "10 1115 1120 

Leu Gly Val Leu Met Leu Ala Gly Ser Clu Phe Asp Phe He yal Arg 

1125 1130 113& 

Tyr Phe Phe Ala Val Leu Ala lie Leu Thr Val Leu Gly Val Leu Asn 
1140 ll* 5 11 

Gly Leu Val Leu Leu Pro Val Leu Leu Ser Phe Phe Gly Pro Cys Pro 
* H55 1160 H65 

Glu Val Ser Pro Ala Asn Gly Leu Asn Arg Leu Pro Thr Pro Ser Pro 
1170 1175 ll 80 



Glu Pro Pro Pro Ser Val Val Arg Phe Ala Val Pro Pro Gly His Thr 
1185 



1190 H95 1200 



Asn Asn Gly Ser Asp Ser Ser Asp Ser Glu Tyr Ser Ser Gin Thr Thr 
1205 121° 1 

val Ser Gly He Ser Glu Glu Leu Arg Gin Tyr Glu Ala Gin Gin Gly 
1220 1225 1230 

Ala Gly Gly Pro Ala Hla Gin Val He Val Glu Ala Thr Glu Asn Pro 
1235 1240 1245 

Val Phe Ala Arg Ser Thr Val Val His Pro Asp Ser Arg His Gin Pro 
1250 1255 1260 



Pro Leu Thr Pro Arg Gin Gin Pro His Leu Asp ser Gly Ser Leu Ser 
1265 1270 1275 1280 

Pro Gly Arg Gin Gly Gin Gin Pro Arg Arg^Asp Pro Pro Arg Glumly 

Leu Arg Pro Pro Pro Tyr Arg Pro Arg Arg Asp Ala Phe Glu lie Ser 
1300 13° 5 1310 

Thr Glu Gly His ser Gly Pro Ser Asn Arg Asp Arg Ser Gly Pro Arg 
1315 "20 1325 

Gly Ala Arg Ser Hie Asn Pro Arg Asn Pro Thr Ser Thr Ala Met Gly 
1330 1335 "40 

Ser Ser Val Pro Ser Tyr Cys Gin Pro lie Thrjhr Val Thr Ala Ser 
1345 



1350 1355 1360 



Ala Ser Val Thr Val Ala Val His Pro Pro Pro Gly Pro Gly Arg Asn 
1365 1370 I 375 
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Pro Arg oiy Gly Pro Cy. Pr Gly Tyr Glu Ser Tyr Pro Gl« Thr Asp 

1380 1335 * 



1390 



His Gly Val Phe Glu Asp Pro His Val Pro Phe Hi. Val Arg Cys Glu 
1395 1400 1405 

So*'" ^ ^ fi" G1U LCU Gln A8 P V«l Glu Cys 

1415 1420 

Glu Glu Arg Pro Trp Gly Ser Sar Ser Asn 
1425 1430 

(2) INFORMATION FOR SEQ ID NO:ll: 

(i) SEQUENCE CHARACTERISTICS i 

(A) LENGTH i 11 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO:ll: 

He He Thr Pro Leu Asp Cys Phe Trp Glu Gly 
1 5 10 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 5 amino acids 

(B) TYPE: amino acid 

<C) STRANDED NESS : single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12 : 

Leu He Val Gly Gly 
* 5 

(2) INFORMATION FOR SEQ ID NO: 13: 

U) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

Pr Phe Phe Trp Clu Gin Tyr 
1 5 
(2) INFORMATION FOR SEQ ID NO:14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDED NESS : single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: other nucleic acid 
(A) DESCRIPTION: /desc - "prxmer 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
GGACGAATTC AARGTNCAYC ARYTNTGG 
(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDBDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc « "pruner* 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
GGACGAATTC CYTCCCARAA RCANTC 
(2) INFORMATION FOR SEQ ID NO:16i 

(i) SEQUENCE CHARACTERISTICS s 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(A) DESCRIPTION: /desc » -primer* 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
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CCACCAATTC YTNGANTCYT TYTGGGA 
(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS t 

(A) LENGTH: 31 base pairs 

(B) TYPE i nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(A) DESCRIPTION: /desc - "primer" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
CATACCAGCC AAGCTTGTCN GGCCARTGCA T 
(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5288 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

GAATTCCGGC GACCGCAAGG AGTGCCGCGG AAGCGCCCCA AGGACACGCT CGCTCGGCGC 60 

GCCGGCTCTC GCTCTTCCGC GAACTGGATG TGGGCAGCGG CGGCCGCAGA GACCTCGGGA 120 

CCCCCGCGCA ATGTGGCAAT GGAAGGCGCA GCGTCTGACT CCCCGGCAGC GGCCGCGGCC 180 

GCAGCCGCAG CAGCGCCCGC CGTGTGAGCA GCAGCAGCGC CTGGTCTGTC AACCGGAGCC 240 

CGAGCCCGAG CAGCCTGCGG CCAGCAGCGT CCTCGCAAGC CGAGCCCCCA GGCGCGCCAG 300 

GACCCCGCAG CAGCGGCAGC AGCGCGCCGG GCCGCCCGGG AAGCCTCCGT CCCCGCGGCG 360 

GCGGCGGCGG CGGCGGCGGC AACATGCCCT CGGCTGGTAA CGCCGCCGAG CCCCAGGACC 420 

GCGGCGGCGG CGGCAGCGGC TGTATCGGTG CCCCGGGACG GCCCGCTGGA GGCGGCAGGC 480 

CCAGACGGAC GGGGGGGCTG CGCCGTGCTG CCGCGCCGGA CCGGCACTAT CTGCACCGGC 540 

CCAGCTACTG CGACGCCGCC TTCGCTCTGG AGCAGATTTC CAAGGGGAAG GCTACTGGCC 600 

GCAAAGCGCC ACTGTGCCTG AGAGCGAAGT TTCAGAGACT CTTATTTAAA CTGGGTTGTT 660 

ACATTCAAAA AAACTGCGGC AAGTTCTTGG TTGTGGGCCT CCTCATATTT GGGGCCTTCG 720 
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CGCTGGGATT AAAAGCAGCG AACCTCGAGA CCAACGTGGA GGAGCTGTGG GTGGAAGTTG 780 

GAGGACGAGT AAGTCGTGAA TTAAATTATA CTCGCCAGAA GATTGGAGAA GAGGCTATGT 840 

TTAATCCTCA ACTCATGATA CAGACCCCTA AAGAAGAAGG TGCTAATGTC CTGACCACAG 900 

AAGCGCTCCT ACAACACCTG GACTCGGCAC TCCAGGCCAG CCGTGTCCAT GTATACATGT 960 

ACAACAGCCA GTGGAAATTG GAACATTTGT GTTACAAATC AGGAGAGCTT ATCACAGAAA 1020 

CAGGTTACAT GGATCAGATA ATAGAATATC TTTACCCTTG TTTGATTATT ACACCTTTGG 1080 

ACTGCTTCTG GGAAGGGCCG AAATTACAGT CTGGGACAGC ATACCTCCTA GGTAAACCTC 1140 

CTTTGCGGTG GACAAACTTC GACCCTTTGG AATTCCTGGA AGAGTTAAAG AAAATAAACT 1200 

ATCAAGTGGA CAGCTGGGAG GAAATGCTGA ATAAGGCTGA GGTTGGTCAT GGTTACATGG 1260 

ACCGCCCCTG CCTCAATCCG GCCGATCCAG ACTGCCCCGC CACAGCCCCC AACAAAAATT 1320 

CAACCAAACC TCTTGATATG GCCCTTGTTT TGAATGGTGG ATGTCATGGC TTATCCAGAA 1380 

AGTATATGCA CTGGCAGGAG GAGTTGATTG TGGGTGGCAC AGTCAAGAAC AGCACTGGAA 1440 

AACTCGTCAG CGCCCATGCC CTGCAGACCA TGTTCCAGTT AATGACTCCC AAGCAAATCT 1500 

ACGAGCACTT CAAGGGGTAC GAGTATGTCT CACACATCAA CTGGAACGAG GACAAAGCGG 1560 

CAGCCATCCT GGAGGCCTGG CAGAGGACAT ATGTGGAGGT GGTTCATCAG AGTGTCGCAC 1620 

AGAACTCCAC TCAAAAGGTG CTTTCCTTCA CCACCACGAC CCTGGACGAC ATCCTGAAAT 1680 

CCTTCTCTGA CGTCAGTGTC ATCCGCGTGG CCAGCGGCTA CTTACTCATG CTCGCCTATG 1740 

CCTGTCTAAC CATGCTGCGC TGGGACTGCT CCAAGTCCCA GGGTGCCGTG GGGCTGGCTG 1800 

GCGTCCTGCT GGTTGCACTG TCAGTGGCTG CAGGACTGGG CCTGTGCTCA TTGATCGGAA 1860 

TTTCCTTTAA CGCTGCAACA ACTCAGCTTT TGCCATTTCT CGCTCTTGGT GTTGGTGTGG 1920 

ATGATGTTTT TCTTCTGGCC CACGCCTTCA GTGAAACAGG ACAGAATAAA AGAATCCCTT 1980 

TTGAGGACAG GACCGGGGAG TGCCTGAAGC GCACAGGAGC CAGCGTGGCC CTCACGTCCA 2040 

TCAGCAATGT CACAGCCTTC TTCATCGCCG CGTTAATCCC AATTCCCGCT CTGCGGGCGT 210O 

TCTCCCTCCA GGCAGCGGTA GTAGTGGTGT TCAATTTTGC CATGGTTCTG CTCATTTTTC 2160 

CTGCAATTCT CAGCATGGAT TTATATCGAC GCGAGGACAG GAGACTGGAT ATTTTCTGCT 2220 

GTTTTACAAG CCCCTGCGTC AGCAGAGTGA TTCAGGTTGA ACCTCAGGCC TACACCGACA 2280 

CACACGACAA TACCCGCTAC AGCCCCCCAC CTCCCTACAG CAGCCACAGC TTTGCCCATG 2340 

AAACGCAGAT TACCATGCAG TCCACTGTCC AGCTCCGCAC GGAGTACGAC CCCCACACGC 2400 

ACGTGTACTA CACCACCCCT GAGCCGCGCT CCGAGATCTC TGTGCAGCCC GTCACCGTGA 2460 



54 



WO 96/11260 PC17US95/13233 

CACAGGACAC CCTCAGCTGC CAGAGCCGAG ACAGCACCAG CTCCACAAGG GACCTGCTCT 2520 

CCCAGTTCTC CGACTCCAGC CTCCACTGCC TOGAGCCCCC CTGTAOGAAG TGGACACTCT 2580 

CATCTTTTGC TGAGAAGCAC TATGCTCCTT TCCTCTTGAA ACCAAAAGCC AAGGTAGTGG 2640 

TGATCTTCCT TTTTCTGCGC TTGCTGGGGG TCAGCCTTTA TGGCACCACC CGAGTGAGAG 2700 

ACGGGCTGGA CCTTACGGAC ATTGTACCTC GGGAAACCAG AGAATATGAC TTTATTGCTG 2760 

GAGAATTCAA ATACTTTTCT TTCTACAACA TGTATATAGT CACCCAGAAA GCAGACTACC 2820 

CGAATATCCA GCACTTACTT TACGACCTAC ACAGGAGTTT CAGTAACGTG AAGTATGTGA 2880 

TGTTGGAAGA AAACAAACAG CTTCCCAAAA TGTGGCTGCA CTACTTCAGA GACTGGCTTC 2940 

AGGGACTTCA GGATGCATTT GACAGTGACT GGGAAACCGG GAAAATCATG CCAAACAATT 3000 

ACAAGAATGG ATCAGACGAT GGAGTCCTTG CCTACAAACT CCTCGTGCAA ACCGGCAGCC 3060 

GCGATAAGCC CATCGACATC AGCCAGTTGA CTAAACAGCG TCTGGTGGAT GCAGATGGGA 3120 

TCATTAATCC CAGCCCTTTC TACATCTACC TGACGGCTTG GGTCAGCAAC GACCCCGTCG 3180 

CGTATGCTGC CTCCCAGGCC AACATCCGGC CACACCGACC AGAATGGGTC CACGACAAAG 3240 

CCGACTACAT GCCTGAAACA AGGCTGAGAA TCCOGGCAGC AGAGCCCATC GAGTATGCCC 3300 

AGTTCCCTTT CTACCTCAAC GGGTTGCGGG ACACCTCAGA CTTTGTGGAG GCAATTGAAA 3360 

AAGTAAGGAC CATCTGCAGC AACTATACGA GCCTGGGGCT GTCCAGTTAC CCCAACGGCT 3420 

ACCCCTTCCT CTTCTGGGAG CAGTACATCG GCCTCCGCCA CTGGCTGCTG CTCTTCATCA 3480 

GCGTGGTGTT GGCCTGCACA TTCCTCGTGT GCGCTGTCTT CCTTCTGAAC CCCTGGACGG 3540 

CCGGGATCAT TGTGATGGTC CTGCOGCTCA 7GACGGTCGA GCTGTTCGGC ATGATGGGCC 3600 

TCATCGGAAT CAAGCTCAGT GCCGTGCCCG TGGTCATCCT GATCGCTTCT GTTGGCATAG 3660 

GAGTGGAGTT CACCGTTCAC GTTGCTTTGG CCTTTCTGAC GGCCATCGGC GACAAGAACC 3720 

GCAGGGCTGT CCTTGCCCTG GAGGACATGT TTGCACCCGT CCTGGATGGC GCCGTGTCCA 3780 

CTCTGCTGGG AGTGCTGATG CTGGCGGGAT CTGAGTTCGA CTTGATTGTC AGGTATTTCT 3840 

TTGCTGTGCT GGCGATCCTC ACCATCCTCG GCGTTCTCAA TGCGCTGCTT TTGCTTCCCG 3900 

TGCTTTTGTC TTTCTTTGGA CCATATCCTG AGGTGTCTCC AGCGAACGGC TTGAACCGCC 3960 

TGCCCACACC CTCCCCTGAG OCACCCCCCA GOGTGGTCCG CTTCGCCATG CCGCCCGCCC 4020 

ACACGCACAG CGGGTCTGAT TCCTCCGACT CGGAGTATAG TTCCCAGACG ACAGTGTCAG 4080 

GCCTCAGCGA GGAGCTTCGG CACTACGAGG CCCAGCAGGG CGCGGGAGGC CCTGCCCACC 4140 

AAGTGATCGT GGAAGCCACA GAAAACCCCG TCTTCGCCCA CTCCACTGTG GTCCATCCOG 4200 
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AATCCAGGCA TCACCCACCC TCGAACCCGA GACAGCAGCC CCACCTGGAC TCAGGGTCCC 
TGCCTCCCGG ACGGCAAGGC CACCAGCCCC GCAGGGACCC CCCCAGAGAA GGCTTGTGGC 
CACCCCTCTA CAGACCGCGC AGACACGCTT TTGAAATTTC TACTGAAGGG CATTCTGGCC 
CTAGCAATAG GGCCCGCTGG GGCCCTCGCG GGGCCCGTTC TCACAACCCT CGGAACCCAG 
CGTCCACTGC CATGGGCAGC TCCGTGCCCG GCTACTGCCA GCCCATCACC ACTGTGACGG 
CTTCTGCCTC CGTGACTGTC GCCGTGCACC CGCCGCCTGT CCCTGGGCCT GGGCGGAACC 
CCCGAGGGGG ACTCTGCCCA GGCTACCCTG AGACTGACCA CGGCCTGTTT GAGGACCCCC 
ACGTGCCTTT CCACGTCCGG TGTGAGAGGA GGGATTCGAA GGTGGAAGTC ATTGAGCTGC 
AGGACGTGGA ATGCGAGGAG AGGCCCCGGG GAAGCAGCTC CAACTGAGGG TGATTAAAAT 
CTGAAGCAAA GAGGCCAAAG ATTGGAAACC CCCCACCCCC ACCTCTTTCC AGAACTGCTT 
GAAGAGAACT GGTTGGAGTT ATGGAAAAGA TGCCCTGTGC CAGGACAGCA GTTCATTGTT 
ACTGTAACCG ATTGTATTAT TTTGTTAAAT ATTTCTATAA ATATTTAAGA GATGTACACA 
TGTGTAATAT AGGAAGGAAC GATGTAAAGT GGTATGATCT GGGGCTTCTC CACTCCTGCC 
CCAGAGTGTG GAGGCCACAG TGGGGCCTCT CCGTATTTGT GCATTGGGCT CCGTGCCACA 
ACCAAGCTTC ATTAGTCTTA AATTTCAGCA TATGTTGCTG CTGCTTAAAT ATTGTATAAT 
TTACTTGTAT AATTCTATGC AAATATTGCT TATGTAATAG GATTATTTTG TAAAGGTTTC 
TGTTTAAAAT ATTTTAAATT TGCATATCAC AACCCTGTGG TAGTATGAAA TGTTACTGTT 5220 
AACTTTCAAA CACGCTATGC GTGATAATTT TTTTGTTTAA TGAGCAGATA TGAAGAAAGC 
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4320 

4380 

4440 

4500 

4560 

4620 

4660 

4740 

4800 

4860 

4920 

4980 

5040 

5100 

5160 



CCGGAATT 

(2) INFORMATION FOR SEQ ID NO J 19: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 1447 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

Met Ala ser Ala Gly Asn Ala Ala Glu Pro Gin Asp Arg Gly Gly Cly 
! 5 10 » 

Gly Ser Gly eye II Oly Ala Pro Gly Arg Pr Ala Gly Gly Gly Arg 
20 25 

56 



5280 
5288 



WO 96/11260 



PCT/US95/13233 



Arg Arg Arg Thr Gly Gly Leu Arg Arg Ala Ala Ala Pro Asp Arg Asp 
35 40 45 

Tyr Leu His Arg Pro Ser Tyr Cys Asp Ala Ala Phe Ala Leu Glu Gin 
50 55 60 

He Ser Lya Gly Lya Ala Thr Gly Arg Lya Ala Pro Leu Trp Leu Ara 
65 70 75 80 

Ala Lys Phe Gin Arg Leu Leu Phe Lys Leu Gly Cys Tyr He Gin Lys 
85 90 95 

Asn Cys Gly Lys Phe Leu Val Val Gly Leu Leu He Phe Gly Ala Phe 
100 105 no 

Ala Val Gly Leu Lys Ala Ala Asn Leu Glu Thr Asn Val Glu Glu Leu 
115 120 125 

Trp Val Glu Val Gly Gly Arg Val Ser Arg Glu Leu Asn Tyr Thr Arg 
130 135 140 

Gin Lys lie Gly Glu Glu Ala Met Phe Asn Pro Gin Leu Met He Gin 
" 5 150 155 leo 

Thr Pro Lys Glu Glu Gly Ala Asn Val Leu Thr Thr Glu Ala Leu Leu 
165 170 175 

Gin His Leu Asp Ser Ala Leu Gin Ala Ser Arg Val His Val Tyr Met 
180 185 190 

Tyr Asn Arg Gin Trp Lys Leu Glu His Leu Cys Tyr Lys Ser Gly Glu 
195 200 205 

Leu He Thr Glu Thr Gly Tyr Met Asp Gin He He Glu Tyr Leu Tyr 
210 215 220 

Pro Cys Leu He He Thr Pro Leu Asp Cys Phe Trp Glu Gly Ala Lys 
225 230 235 2 40 

Leu Gin Ser Gly Thr Ala Tyr Leu Leu Gly Lys Pro Pro Leu Arg Trp 
245 250 255 

Thr Asn Phe Asp Pro Leu Glu Phe Leu Glu Glu Leu Lys Lys He Asn 
260 265 270 

Tyr Gin Val Asp Ser Trp Glu Glu Met Leu Asn Lys Ala Glu Val Gly 
275 280 285 

His Gly Tyr Met Asp Arg Pro Cys Leu Asn Pro Ala Asp Pro Asp Cys 
290 295 300 



Pro Ala Thr Ala Pro Asn Lys Asn Ser Thr Lys Pro Leu Asp Met Ala 
305 310 315 320 

Leu Val Leu Asn Gly Gly Cys His Cly Leu Ser Arg Lys Tyr Met His 
325 330 335 

Trp Gin Glu Glu Leu He Val Gly Gly Thr Val Lya Asn Ser Thr Gly 
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340 



345 



350 



, e *i* His Ala Leu Gin Thr Met Phe Gin Leu Met Thr 
Ly8 Leu Val Ser Ala Hie Ala Leu ^ 

355 360 



Pto Ly. «n M~ Tyr OW HI. Ph. •!» Tyr £ ^ ~ - "" 

370 375 

n. x.» xrp »» ox. £ «• »; - " u "* Tcl> S 

" ,hr Tyr v.1 clu v.! V.! U. olh J- ~ - «« £ * 

405 * 1U 

«. Ly. v.1 L.u s.r Ph. Thr Thr Thr Thr Leu up »P I« I- 

420 425 
S.r Ph. S.r MP V.1 S.r V.! II. «, «! Xl. =.« 01, T»r L.U _ 

435 4 * U 
« Leu XI. Tyr XI. oy. L- Thr H.t L.U Xr, Trp MP Cy. S.r Ly. 

450 455 
ser 01. Oly M. V.1 01, - XI. oly V.X L.u M V.! XI. L.u s.r 
465 «° 

v.i xi. xi. oly i- U - <*. - J- 01y IW s " ^ 

485 490 
Xi. XI. Thr Thr «. v.! Leu Pro £ L.u XI. L.u oly v.1 oly v.1 

500 505 

hu jvla Phe Ser Glu Thr Gly Gin Asn 
Asp Asp Val Phe Leu Leu Ala His Ala Phe ^ 

515 " u 
L y . xr, II. Pro Ph. olu X.P xr, Thr oly olu cy. «- xr, Thr 

530 535 

01 y XI. S.r v.1 XI. L.U Thr s.r II. s.r £ v.1 Thr XL Ph. Ph. 

545 550 

Mt xla XI. L.u U. Pro II. Pro XI. Leu Xr, XL Ph. s.r Leu «. 
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XI. Xla V.1 V.! V.1 V.1 Ph. » Ph. XI. Met v.! L.u Lou II. Ph. 

580 585 
Pro XI. U. U. s.r Met X.P «. Tyr xr, xr, olu X., xr, xr, ». 

595 600 
x., II. Ph. cy. cy. Ph. Thr s.r Pro cy. V.1 Ser Xr, v.! U. «. 

610 615 
Val Clu Pro Cln Ala Tyr Thr Asp Thr >U Asp Asn Thr Arg Tyr Ser 
625 630 " 

Pro Pro Pro Pro Tyr S r ser His Ser Phe Ala His Clu Thr Cln II 



645 650 
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Thr Met Gin Ser Thr Val Gin Leu Arg Thr Glu Tyr Asp Pro His Thr 
660 665 670 

His Val Tyr Tyr Thr Thr Ala Glu Pro Arg Ser Glu lie Ser Val Gin 
675 680 685 

Pro Val Thr Val Thr Gin Asp Thr Leu Ser Cys Gin Ser Pro Glu Ser 
690 695 700 

Thr Ser Ser Thr Arg Asp Leu Leu Ser Gin Phe Ser Asp Ser Ser Leu 
705 710 715 720 

His Cys Leu Glu Pro Pro Cys Thr Lye Trp Thr Leu Ser Ser Phe Ala 
725 730 735 

Glu Lys His Tyr Ala Pro Phe Leu Leu Lys Pro Lys Ala Lys Val Val 
740 745 750 

Val lie Phe Leu Phe Leu Gly Leu Leu Gly Val Ser Leu Tyr Gly Thr 
755 760 765 

Thr Arg Val Arg Asp Gly Leu Asp Leu Thr Asp lie Val Pro Arg Glu 
770 775 780 

Thr Arg Glu Tyr Asp Phe lie Ala Ala Gin Phe Lys Tyr Phe Ser Phe 
785 790 795 800 

Tyr Asn Met Tyr lie Val Thr Gin Lys Ala Asp Tyr Pro Asn lie Gin 
805 810 815 

His Leu Leu Tyr Asp Leu His Arg Ser Phe Ser Asn Val Lys Tyr Val 
820 825 830 

Met Leu Glu Glu Asn Lys Gin Leu Pro Lys Met Trp Leu His Tyr Phe 
835 840 845 

Arg Asp Trp Leu Gin Gly Leu Gin Asp Ala Phe Asp Ser Asp Trp Glu 
850 855 860 

Thr Gly Lys lie Met Pro Asn Asn Tyr Lys Asn Gly Ser Asp Asp Gly 
865 870 875 880 

Val Leu Ala Tyr Lys Leu Leu Val Gin Thr Gly Ser Arg Asp Lys Pro 
885 890 895 

He Asp He Ser Gin Leu Thr Lys Gin Arg Leu Val Asp Ala Asp Gly 
900 905 910 

He He Asn Pro Ser Ala Phe Tyr He Tyr Leu Thr Ala Trp Val Ser 
915 920 925 

Asn Asp Pro Val Ala Tyr Ala Ala Ser Gin Ala Asn He Arg Pro His 
930 935 940 

Arg Pro Glu Trp Val His Asp Lys Ala Asp Tyr Met Pro Glu Thr Arg 
945 950 955 960 



Leu Arg He Pro Ala Ala Glu Pro He Glu Tyr Ala Gin Phe Pr Phe 
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965 970 975 

Tyr Leu Asn Gly L u Arg Asp Thr S r Asp Phe Val Glu Ala lie Glu 
960 985 990 

Lys Val Arg Thr lie Cya Ser Asn Tyr Thr Ser Leu Gly Leu Ser Ser 
995 1000 1005 

Tyr Pro Aan Gly Tyr Pro Phe Leu Phe Trp Glu Gin Tyr lie Gly Leu 
1010 1015 1020 

Arg His Trp Leu Leu Leu Phe He Ser Val Val Leu Ala Cya Thr Phe 
1025 1030 1035 1040 

Leu Val Cya Ala Val Phe Leu Leu Aan Pro Trp Thr Ala Gly He He 
1045 1050 1055 

Val Met Val Leu Ala Leu Met Thr Val Glu Leu Phe Gly Met Met Gly 
1060 1065 1070 

Leu He Gly He Lys Leu Ser Ala Val Pro Val Val lie Leu He Ala 
1075 1080 1085 

Ser Val Gly He Gly Val Glu Phe Thr Val His Val Ala Leu Ala Phe 
1090 1095 1100 

Leu Thr Ala He Gly Aap Lya Asn Arg Arg Ala Val Leu Ala Leu Glu 
1105 HIO H15 H20 

Hie Met Phe Ala Pro Val Leu Asp Gly Ala Val Ser Thr Leu Leu Gly 
1125 H30 1135 

Val Leu Met Leu Ala Gly Ser Glu Phe Asp Phe He Val Arg Tyr Phe 
1140 1145 1150 

Phe Ala Val Leu Ala He Leu Thr He Leu Gly Val Leu Asn Gly Leu 
1155 H60 1165 

Val Leu Leu Pro Val Leu Leu Ser Phe Phe Gly Pro Tyr Pro Glu Val 
1170 H75 1180 

Ser Pro Ala Asn Gly Leu Aan Arg Leu Pro Thr Pro ser Pro Glu Pro 
1185 H90 H95 1200 

Pro Pro Ser Val Val Arg Phe Ala Met Pro Pro Gly Hie Thr His Ser 
1205 1210 1215 

Gly Ser Asp Ser Ser Aap Ser Glu Tyr Ser Ser Gin Thr Thr Val Ser 
1220 1225 1230 

Gly Leu Ser Glu Glu Leu Arg Hia Tyr Glu Ala Gin Gin Gly Ala Gly 
1235 1240 1245 

Gly Pro Ala Hia Gin Val He Val Glu Ala Thr Glu Aan Pro Val Phe 
1250 1255 1260 

Ala His Ser Thr Val Val Hia Pr Glu S r Arg His Hia Pro Pr S r 
1265 1270 1275 1280 
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Asn Pro Arg Gin Gin Pro His Leu Asp Ser Gly S r Leu Pro Pro Gly 
1285 1290 1295 

Arg Gin Gly Gin Gin Pro Arg Arg Asp Pro Pro Arg Glu Gly Leu Trp 
1300 1305 1310 

Pro Pro Leu Tyr Arg Pro Arg Arg Asp Ala Phe Glu lie Ser Thr Glu 
1315 1320 1325 

Gly His Ser Gly Pro Ser Asn Arg Ala Arg Trp Gly Pro Arg Gly Ala 
1330 1335 1340 

Arg Ser His Asn Pro Arg Asn Pro Ala Ser Thr Ala Met Gly Ser Ser 
1345 1350 1355 1360 

Val Pro Gly Tyr Cys Gin Pro lie Thr Thr Val Thr Ala Ser Ala Ser 
1365 1370 1375 

Val Thr Val Ala Val His Pro Pro Pro Val Pro Gly Pro Gly Arg Asn 
1360 1385 1390 

Pro Arg Gly Gly Leu Cys Pro Gly Tyr Pro Glu Thr Asp His Gly Leu 
1395 1400 1405 

Phe Glu Asp Pro His Val Pro Phe His Val Arg Cys Glu Arg Arg Asp 
1410 , 1415 1420 

Ser Lys Val Glu Val lie Glu Leu Gin Asp Val Glu Cys Glu Glu Arg 
1425 1430 1435 1440 



Pro Arg Gly Ser Ser Ser Asn 
1445 
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WHAT ™ rTATMBD TS: 

1 . A DNA sequence other than present in a chromosome encoding a patched gene 
other than the Drosophila patched gene or fragment thereof of at least about 12bp 

5 different from the sequence of the Drosophila patched gene. 

2. A DNA sequence according to Claim 1, wherein said patched gene is a 
mammalian gene. 

10 3- A DNA sequence according to Claim 1 for human, mouse, mosquito, butterfly 
or beetle patched gene. 

4. A DNA sequence according to Claim 3, wherein said DNA sequence is a 
human sequence. 

15 

5. A DNA sequence according to Claim 4, wherein said DNA sequence is a 
mouse sequence. 

6. A DNA sequence according to Claim 1, wherein said DNA sequence is a 
20 fragment of at least about 1 8bp. 

7. A DNA sequence according to Claim 1 joined to a DNA sequence comprising 
a restriction enzyme recognition sequence. 

25 8. An expression cassette comprising a transcriptional initiation region functional 
in an expression host, a DNA sequence according to Claim 1 under the 
transcriptional regulation of said transcriptional initiation region, and a 
transcriptional termination region functional in said expression host. 

30 9. An expression cassette according to Claim 8, wherein said transcriptional 
initiation region is heterologous to said DNA sequence according to Claim 1. 
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10. An expression cassette according to Claim 8, wherein said transcriptional 
initiation region is h mologous to said DNA sequence according to Claim 1 and 
includes the enhancer region. 

5 11. A cell comprising an expression cassette according to Claim 8 as part of an 
extrachromosomal element or integrated into the genome of a host cell as a result of 
introduction of said expression cassette into said host cell and the cellular progeny of 
said host cell. 

10 12. A cell according to Claim 11, further comprising the patched protein in the 
cellular membrane of said cell. 

13. A cell according to Claim 11, wherein said patched protein is a mouse patched 
protein. 

15 

14. A cell according to Claim 1 1, wherein said patched gene is a human patched 
protein. 

15. A cell according to Claim 1 1 , wherein said transcriptional initiation region is a 
20 Drosophila patched gene transcriptional initiation region comprising the promoter 

and enhancer joined to a heterologous gene. 

16. A cell comprising an expression cassette comprising a transcriptional initiation 
region functional in an expression host, said transcriptional initiation region 

25 consisting of a 5' non-coding region regulating the transcription of patched protein 
comprising the promoter and enhancer, a marker gene, and a transcriptional 
termination region, as part of an extrachromosomal element or integrated into the 
genome of a host cell as a result of introduction of said expression cassette into said 
host, and the cellular progeny thereof. 

30 
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17. A cell according to Claim 16, wherein said transcriptional initiation region is 
the Drosophila region. 

18. A method for following embryonic development employing ^patched 
5 protein in an embryo, said method comprising: 

integrating an expression cassette comprising a transcriptional initiation region 
functional in embryonic host cells, said transcriptional initiation region consistmg of 
a 5' non-coding region regulating the transcription oi patched protein, a marker 
gene, and a transcriptional tennination region, wherein said embryonic host cells are 

10 capable of developing into a fetus; 

growing said embryonic host cells, whereby proliferation and differentiation 

occur; and 

locating cells comprising expression of ^patched protein by means of 



15 



expression of said marker gene. 

19 A method for producing patched protein, said method comprising: 
growing a cell according to Claim 11, whereby said patched protein is 

expressed; and 

isolating said patched protein free of other proteins. 



20 



20. A method for screening candidate compounds for binding affinity to the 
patched protein, said method comprising: 

combining said candidate protein with a vertebrate or invertebrate cell 
comprising said patch* protein in the membrane of said cell and an expression 
25 cassette comprising a transcriptional initiation region functional in said cell, a DNA 
sequence according to Claim 1 comprising the entire coding sequence under the 
transcriptional regulation of said transcriptional initiation region, and a 
transcriptional termination region functional in said cell, expressing ^patched 

protein in said cell; and 
30 assaying for the binding of said candidate compound to said patched protein. 
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21 . A method for screening candidate compounds for agonist activity with the 
patched protein, said method comprising: 

combining said candidate protein with a vertebrate or invertebrate cell 
comprising said patched protein in the membrane of said cell and an expression 
5 cassette comprising a transcriptional initiation region functional in an expression 
host, said transcriptional initiation region consisting of a 5' non-coding region 
regulating the transcription of patched protein, a marker gene, and a transcriptional 
termination region, as part of an extrachromosomal element or integrated into the 
genome of a host cell; and 
10 assaying for the expression of said marker gene. 

22. A monoclonal antibody binding specifically to a patched protein, other than 
the Drosophila patched protein. 
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